Does any program really have an unlimited number of bugs?
I really enjoy reading Yegor’s blog. He brings quite interesting ideas regarding programming and business. But sometimes, the thoughts he shares there look a bit strange to me. I’m talking about one of the latest posts titled “Any Program Has an Unlimited Number of Bugs”.
I’m writing this in reply to Yegor’s points mentioned there. Please read them first. To focus on the key notes, let me quote the most sensitive part:
Let’s take this simple Java method that calculates a sum of two integers as an example:
int sum(int a, int b) {
return a + b;
}
How about these bugs:
- It doesn’t handle overflows
- It doesn’t have any user documentation
- Its design is not object-oriented
- It doesn’t sum three or more numbers
- It doesn’t sum double numbers
- It doesn’t cast long to int automatically
- It doesn’t skip execution if one argument is zero
- It doesn’t cache results of previous calculations
- There is no logging
- Checkstyle would complain since arguments are not final
Well, the points highlighted above make me really uncertain about what really Yegor meant. And since he didn’t answer me in the commentary section, I will put my thoughts on that in my own blog.
Let’s look closer at some of them.
It doesn’t have any user documentation
The question is, do you really think we need documentation for sum
function?
What will you put there? I cannot guess nothing other than It adds two
integers
. So what the reason for having it in our code?
In a previous project, I had a teammate who was a fan of docstrings. Every module, a class or a method should have a docstring, he said. The code he wrote looked like a school composition. You could not ship your code without his comments something like “add docs here and there plz”.
Since our product was developing rapidly, we had to change the code all the time. Surely, we used to forget to update those docstrings because it took extra time. It was even worse because the docstrings didn’t tell the truth anymore. Nobody believed them as a result.
What we need sum
documentation for? How should it look like? Who will read
this documentation? A user? I doubt it. A programmer? Even programmers need not
a machinery-generated HTML made of docstrings, but human-written manuals with
examples and snippets.
When you open a particular GitHub project, what do you do first? Reading the
code? No, you scroll down for README
or click on Wiki
link. Digging the
source code is the last thing you would make if something works not well.
Let’s recall what does one say on DOOM III source code and ID Software code standards in general:
Code should be self-documenting. Comments should be avoided whenever possible. Comments duplicate work when both writing and reading code. If you need to comment something to make it understandable it should probably be rewritten.
Its design is not object-oriented
We all know Yegor is passionate by OOP. But really, naming a function buggy just because it’s non-OOP designed is not fair.
What would happen if we had an OOP version of sum
function? I can imagine
something messy like this:
Adder adder = new Adder();
adder.setA(1);
adder.setB(2);
Integer result = adder.getResult() // 3
We’ve got lot more problems here. The number of lines grew up from 1 to 4
(compare to add(1, 2)
). In addition, now adder
object also keeps it’s own
state. How may I track whether .setA
was called or not? What will happen if I
forget to put .setB
in my code? Will it get a Null
value or an exception
raised?
What will happen if I share adder
object across multiple threads so they call
setA
, setB
and getResult
simultaneously? Nobody would predict the result.
Functional programming teaches us to avoid having state and keep the things simple. Having a small function without any state rather than manipulating with objects will pay off for sure.
It doesn’t sum three or more numbers
Surely, if you a Java programmer, you’ve got a problem. Because there is nothing
you can do without rewriting the code. But if you are aware of FP-style and you
have your add
function, you simply do:
(reduce #'add (list 1 2 3) :initial-value 0) ;; common lisp
(reduce add 0 [1 2 3]) ;; clojure
or, in Python:
reduce(add, [1, 2, 3], 0)
This is definitely the right way with reduce
also known as fold
. You don’t
need to write a function that adds million of integers. Instead, you add just
two! And reduce
will care of how to spread it across a sequence of data.
That’s normal to focus on simplest cases and primitive operations first. Then you build abstractions with high-order functions based on primitives. It makes the code easier to understand and debug.
It doesn’t sum double numbers and It doesn’t cast long to int automatically
I don’t know if it is really a bug. On the one hand, it would be great to have a function that adds everything. Again, within any Lisp dialect or Python I may have it easily. Instead, with static typing you have to write more code.
But on the other hand, having exact integer
types might be some kind of
specific requirement made by design. It could be an utility function that adds
to years or any numbers that do not exceed a couple of thousands. So it’s
difficult to judge without a full context (I’ll say more on that below).
It doesn’t skip execution if one argument is zero
I really doubt it will increase the performance. Adding zero to an integer costs
really nothing on modern hardware. The compilers are smart enough to deal with
such cases. But putting an extra if
clause will definitely increase the number
of instructions because now your program makes a comparison step.
This is exactly what did Knuth say about preliminary optimization. You don’t
even have any metrics about what runs faster: adding zero of extra if
statement. You even don’t now how frequently your function will be called with
zero argument. I might be 1 time per 100k calls so your trick is made in vain in
that case.
It does not worth it, after all.
It doesn’t cache results of previous calculations
I’m sure any cache system brings as many problems as it solves. First, where would you store that cache? In application’s memory? In Memcache? User’s cookies?
Then, you program could sum million of numbers per second. Every cache call
needs al least O(C)
or O(logN)
time to lookup a value. You will slow down
your program dramatically with such approach when caching small functions.
There is no logging
The same claims are here: what exactly are you going to log? Who will read that logs and how often? For what purpose you want to store such kind of logs? Where would you store such amount of logs? Did you estimate the slowdown that you will bring to your system by adding logging? What if some of Ops team would misconfigure logging system and add email handler to that logger? Or sending SMS message?
Finally, this function is taken completely out of the scope so it’s impossible to judge on its quality. We should never examine a single function but only their composition instead. Every function has its own weaknesses and strengths. Linked together properly they build a system. In such a system, weakness of any specific function is supported by another function that validates data, filters input params, cleans up system symbols and so forth.
What I want exactly to say is there is nothing criminal in such a function that adds to integers. If it’s surrounded with another function that ensures that both numbers are really integers, there is no need to do all that validations again when adding them. In our projects, we usually separate our application on several levels. On the top level of the application, threre is one that validates the input data. If they are fine, functions on the next level operate on them as well.
In conclusion, please avoid using word “bug” when you talk about lack of
logging or caching. Probably you don’t need that stuff. Keep functions free of
having state and producing side effects. When you afraid of the data you use may
be incorrect (Nulls, overflows), don’t rush into adding nested if
s. You’d
rather update your validation logic first.
Нашли ошибку? Выделите мышкой и нажмите Ctrl/⌘+Enter
Yegor Bugayenko, 5th Jun 2017, link
Thanks for this analysis. You may find this article useful too: http://www.yegor256.com/201... Bugs may be not only functional, but also non-functional, and the area for non-functional bugs is way wider. Moreover, it's unlimited. Of course, you can argue with every one of them, but the point is that in some projects, some of them will make sense. And the amount of them is literally unlimited.
Eugene Nikolaev, 5th Jun 2017, link
Are you sure you enjoyed Yegor's blog?))
>> How may I track whether .setA was called or not?
Holy truth. That's why setters are bad. Use a constructor.
http://www.yegor256.com/201...
I guess OOP version should be like:
Integer sum = new Sum(1, 2)
But as long Integer is not an interface we cannot make that.
So the compromise version would be a sort of:
Integer sum = new Sum(1, 2).intValue()
Yes it may look unnatural with current tools/languages though. But it is still one line.
Ivan Grishaev, 5th Jun 2017, link , parent
Thank you, will read that paper.
Ivan Grishaev, 5th Jun 2017, link , parent
1. Yes, I'm sure. Because enjoying of something doesn't mean you consume only those content that you are agree with.
2. Regarding the constructor pattern you posted above, I had that pattern in my mind. But it still keeps a state inside an object. So I still cannot see any reason of keeping state when we need the result only.