Code Inflation (2015) [pdf]
spinroot.comExcellent analysis.
I'd like to add that "growth over time" also solves the "No Silver Bullet"[1] paradox: while Brooks's analysis[2] seems obviously correct, we also obviously have code that seems orders of magnitude larger than it has any business being[3].
As far as I can tell, Brooks doesn't consider cumulative effects of growth over time, but only a single iteration of analysis → development. So a 20% difference in efficiency is just a 20% difference in outcome. However, that same 20% inefficiency per iteration yields a 40x difference in outcome over 20 years. Compound interest.
[1] https://en.wikipedia.org/wiki/No_Silver_Bullet
[2] "there is no single development, in either technology or management technique, which by itself promises even one order of magnitude [tenfold] improvement within a decade in productivity, in reliability, in simplicity.
[3] MS Office is >400MLOC. Xerox PARX had "personal computing" in ~20KLOC. Even assuming MS Office is 100x better at "officing", that leaves 99.5% of the code unaccounted for. Similar analyses can be done for web browser (WWW.app, 5KLOC, Mozilla 15MLOC) and other operating systems etc.
Brooks argument if I remember right was about humans software always trying to push the envelope. So it is always going to be hard.
The inflationary phenomena described here is more about incremental decorative and accidental complexity. The root cause is not the human enterprising drive to go boldly where no man gone before.
Looking at the examples of adding headers and command line bells and whistles I wonder what human trait drives this? Those seem (product-)management and architecture/framework type decisions possibly driven by a need to have things uniform and under control.
Moore's law is tiring (it is more obvious on the data center side than in devices yes). Will we become more frugal again?
[2] compilers, debuggers, IDE, unit tests, configuration management, PaaS, IaaS...
We could argue, is it half a fold? is it two folds? depends on the day.
Code inflation nicely cancels out gains from Moore's law. But there's also another interesting phenomenon seen in embedded hardware - it seems that each generation of devices has more (exponentially more?) computing power on-board, while the user-facing functionality stays the same or even slightly degrades. For example, the functionality of today's fridges, kettles and washing machines is equivalent to those made 20 years ago, but today's versions will break down faster and in ways nigh impossible to fix by yourself.
We're about to have Android running on toasters. And I can't stop myself from asking the question - why?
> while the user-facing functionality stays the same or even slightly degrades
I would disagree with that on so many points. Today, more than ever we have massive differences. "Hey Siri/Google", the camera functionality is just incomparable, the maps, ...
In the case of consumer white goods the business case is that expensive mechanical components and security mechanisms are replaced by electronic ones that are cheaper. And indeed, counting inflation, today's whitegoods are far cheaper than they ever were. This is happening in power adapters, but also in washing machines and kettles. This means that half the components only exist in the virtual sense and you'd need half the design, a plastic molding factory, and a master's degree to have any hope in hell of fixing them. But they're 1/10th to 1/5th of your monthly pay, and last 2-10 years, so why bother ?
But the story is the same at a high level for everything from cell phone radios to motor controllers for washing machines. Virtual components, simulated in microcontrollers are far cheaper (and far less repairable) than a real component ever will be.
I bother, because I don't like to throw away a perfectly fine appliance only because some short blew out something on the motherboard. Nor do I want to pay for a new one every 2 - 5 years when the hardware could perfectly well last for 10-20 years. In a perfect world with perfect recycling, I wouldn't mind that much, but as it is today, it's only a way to make me spend more and trash the planet more.
Also, I don't feel like the price of appliances was dropping over my lifetime, so I have to ask - where do those apparent savings go? They're definitely not being passed on to consumers.
> I would disagree with that on so many points. Today, more than ever we have massive differences. "Hey Siri/Google", the camera functionality is just incomparable, the maps, ...
I'll grant you camera, because chips and algorithms do get better. Siri/Google doesn't really feel like that much of an achievement over what was possible 10 years ago, except nobody tried to build that product then, and smartphones weren't exactly popular. As for maps, I'll only point to Google Maps application, which is constantly degrading in quality and functionality for the past 5+ years...
Appliances perhaps don't feel like they're dropping in price due to inflation (i.e., the number is higher), and added features.
If you correct for those, they are.
But they're 1/10th to 1/5th of your monthly pay, and last 2-10 years, so why bother ?
2-10 years is an extremely short lifespan for white goods; 20-30 years is more like it, and 40-50 years is not uncommon.
In the case of consumer white goods the business case is that expensive mechanical components and security mechanisms are replaced by electronic ones that are cheaper.
It's not just that, but even mechanical components are deliberately made weaker to use less material and thus cost less. Sometimes manufacturers push that boundary a little too far, Samsung's "exploding" washing machines being one of the latest examples of this.
Even the industry admits that white goods aren't built to last as long as they were the past. 40-50 years is unreasonable now. I'd be happy with 20.
Which reminds me of this classic http://www.danielsen.com/jokes/objecttoaster.txt
I think it's partly talent availability. There are lots of android devs, but very few who can write microcontroller code in ASM or super minimal C.
If you can run android on it, you can increase the size of your hiring pool a hundred fold. That matters a lot for big companies trying to ship products at scale.
Of course I don't get why a toaster or a fridge really needs a CPU at all, but that's another matter.
To bring us closer to this future:
Some days I'm inclined to rant about software bloat. I've often quoted or paraphrased a bit of dialogue from the Scott Meyer novel _Off to Be the Wizard_ that's kind of funny but also pretty sad:
Phillip (time traveler from 1984): What on earth can a person do with 4 gigabytes of RAM?
Martin (from 2012): Upgrade it immediately.
But on the other hand, some of what might be perceived as bloat is really useful, even necessary, to someone. For example, show me a "light" desktop operating system, Unix desktop environment, or web browser, and I'll probably tell you that many of my friends couldn't use it, because they're blind. (As for me, I'm legally blind, but I have enough sight to read a screen up close, so I don't need a screen reader, but I often use one.) Accessibility requires extra code. In Windows 10, UIAutomationCore.dll is about 1.3 MB, and will no doubt get bigger as Microsoft continues to improve the accessibility of Windows. But you can't write that off as bloat.
Elsewhere on this thread, there was some discussion of microcontroller software versus Android. A user interface written for a microcontroller with a screen is inaccessible to a blind person, except by memorizing a sequence of button presses, if the UI is simple enough and the device has physical buttons in the first place. But if the device is running Android, the infrastructure for accessibility is there; just add a text-to-speech engine and TalkBack (edit: and make sure the application follows accessibility guidelines). That ain't gonna fit in 1 MB or less of flash and 512K or less of SRAM. So sometimes we may be inclined to rant about bloat, but there's actual progress, too.
> Accessibility requires extra code. In Windows 10, UIAutomationCore.dll is about 1.3 MB, and will no doubt get bigger as Microsoft continues to improve the accessibility of Windows. But you can't write that off as bloat.
I'm convinced that's not true. Consider the case of CSS and ARIA roles. You could very easily write CSS/JS components defined on roles using CSS 2.1 attribute selectors, and accessibility just follows naturally. For instance, define your tab panels using role="tab", and screen readers should immediately understand it, while for visual designs, your CSS and JS just select on the appropriate roles. You then use classes for non-semantic content, like font and colour instead of using classes for everything as is currently standard.
So you're not duplicating code, you're writing better semantic markup once which can be properly interpreted multiple ways in different media.
Not how it's currently done, but there's no reason it can't be done that way, even for desktop UIs.
I don't disagree with your point, but I think you are over estimating how much code some of that takes.
The reality is that we use so much code because it is easier that way. And the added ease is very hard to remove.
Then there is data. A single picture is more storage than my first few computers had. Compression isn't magic.
True. It's particularly sad that so many desktop apps these days bundle a full browser engine (through Electron or the like) because it's so convenient. And I hadn't really thought about how much space photos take up because gasp I don't have a photo library. Music, though? That eats up space too.
Yeah, a ton of data goes to music and video. On the plus side, I no longer spend a ton of physical space on these. Not sure what is more wasteful, tons of movies and music I don't get to on my computer, or bookshelves of the same.
And I still have a fair number of books.
There is a small irony here in the author's use of histogram plotting that performs meaningless perspective rendering - one of the features that has contributed to document-creation software bloat.
Perspective is not an expensive feature and I really doubt it would contribute significally to code inflation.
Start using libraries for a tiny feature without consider the whole size impact is what is driving code inflation. Now and then I find a very tiny project which executable is big just because they are linking again Boost for just one couple of classes.
I am sure you are right about the big picture. Library inclusion has leverage, and library code is written for the general case. Templates provide a way to specialize the code that is built, but I imagine that cross-cutting concerns ultimately limit the degree to which you can create generic code that compiles to exactly what you want, and nothing more.
Perspective is not an expensive feature and I really doubt it would contribute significally to code inflation.
It depends on how it's been implemented.
All of Boost?
It's modular, so you should just have to pull in what you use.
In practice it is not very modular, parts of the library like Asio are so dependent to other parts in boost, and given that bootstrapping can be an annoyance at times, I have seen a los of project that decided to include the whole library.
I work on some pretty heinous legacy code and build systems. I often measure my success by the amount of code I remove rather than the amount I write.
> It sometimes seems as if [software] has just gotten bigger, but not safer. So why is that?
The premise for the article. But it provides no proof of it. And I can just as well say "for me it seems software in general have gotten much safer over the years, because I barely remember a time a program crashed."
Results of code review of false.c:
* Use of undocumented magic numbers (1)
* No use of getter/setter pattern
* Inflexible design (datatype of result is fixed, no template pattern implemented)
* Manual memory management (no garbage collection used)
* No infrastructure for automated testing included
* No unit tests available
* Code has not changed for Years (code smell!); Probable stale code, to be removed in next release.
The article briefly mentions cat, but there's more to say there: https://news.ycombinator.com/item?id=11341175
"the probability of non-trivial defects increases with code size."
Intuitively this feels correct, I wonder if anybody has studied it.
Why does he specifically say "non-trivial" defects? The probability of any defect surely increases with code size?
Perhaps to forestall the suggestion that the growth is in trivial bugs, such as a typo in the text printed in response to the -h option.
Maybe because these are the most relevant ones? Trivial defects are easily detected and fixed. Or else they're not trivial.
In what circumstances is it still useful to have a true/false executable in addition to the shell builtin?
In cases where you might not be using a shell.
Like, say you have a program that uses a user-specified program + arguments, like parallel or find -- but for evaluating a boolean condition.
true/false would help.
The undeniable fact is that there will be a case for any scenario, and that fact is the essential driver for code inflation. You can't shed the code since that code is there for a potential case; and you can't help adding code as you keep discovering/imagining new cases.
This is where a bit wisdom differentiates people. There is no winning trying to match exponential generality with our finite ability. So wise people seek essence and accept the reality that there will always be cases that are uncovered and the important thing is not to lose the essence -- bury the essence amid mountain of glut is not much different from losing the essence, therefore, it is often more plausible for the minimal approach than the bloat. The essence covers 80% of the time and the rest 20% of the time we cope with it.
In the case that you might not be using a shell and when you need the true/false such as with parallel or find? Why not write your true/false program then? In fact, I write my own parallel script every single time when I really need parallel. It is not hard. Takes about the same time for me to go through the man page.
Okay.
But if anything is the bloat it's the true/false shell built-in.
TLDR
Author rants about how the size of the executable for the Unix `true` command has increased "exponentially" from 0 to 22KB over ~30 years, for no apparent reason other than "because it can", referring to cosmology, of course.
"...true and false commands also don’t need an option that can invert the result, or one that would allow it to send its result by email to a party of your choice."
The article is humorous in nature, but not without a point.
OP, thanks for sharing the article.