Portability Is Reliability
evan.nemerson.comFocusing all your might on a single platform can give you reliability and even better performance, and is less work.
Focusing on portability gives you a set of abstractions to build upon, that will take your time, reduce performance often, and limit your capabilities. But... it can also make you have a better mental problem of the problem you're solving, and allow you to move faster when underlying details change.
Like everything, it has ups and downs, I think.
I don't find the defence in depth approach appealing, because it's never going to get you to 100% reliability. Rather than adding six swiss-cheese layers and hoping the holes don't line up, it would be better to spend more time and effort on a single really solid layer.
For the same reason, I don't find the argument for fixing bugs in C codebases at all compelling (at least if we're talking about "micro" bugs rather than overall architecture issues that might carry over to a future rewrite). Sure, there's a lot of C out there. Sure, a lot of the bugs in that C code are findable. But so what? However many bugs you uncover, you're unlikely to find them all, and even if you did then how would you know? So isn't fixing bugs in existing C codebases just throwing good money after bad? You know you're inevitably going to have to rewrite it all eventually anyway.
I don't understand. Wouldn't it mean that you leave software used by millions of users to rot while you rewrite in your newer better stack? Not fixing bugs, holes and regressions? How would that make things better? Maybe you have something else in mind, but I think you failed to conveyed it here.
Also with newer stack you will still do a bunch of logic errors, that you managed to fix in you old ugly C code base. You gonna repeat that the hard way.
Don't get me wrong, I'm not against rewrites. But not fixing old code, until you are done with next version is irresponsible.
In a way I think we already live in such a world. Companies are throwing a product, then they are doing minimal maintenance until they newer better version will come. When it finally comes it may have a few good qualities, but it is never without hindrance to users.
> Wouldn't it mean that you leave software used by millions of users to rot while you rewrite in your newer better stack? Not fixing bugs, holes and regressions? How would that make things better?
Whatever you do, the software will be full of bugs and holes until you rewrite it. So I think the users are best served by prioritising the actual rewrite rather than papering over the cracks.
> Also with newer stack you will still do a bunch of logic errors, that you managed to fix in you old ugly C code base. You gonna repeat that the hard way.
Not so in my experience. The logic is what you port, so you end up with the same logic. But the "micro" bugs disappear, because that part is different between languages.
> I don't find the defence in depth approach appealing, because it's never going to get you to 100% reliability. Rather than adding six swiss-cheese layers and hoping the holes don't line up, it would be better to spend more time and effort on a single really solid layer.
But those layers are not part of your own work, they are external tools. Yes, it would be great if one of those would just catch all the bugs, but that's not happening.
So you have to choose either one single slice of Swiss cheese, or you work a bit on your plate to be able to add additional safeguards.
> Sure, a lot of the bugs in that C code are findable. But so what? However many bugs you uncover, you're unlikely to find them all, and even if you did then how would you know? So isn't fixing bugs in existing C codebases just throwing good money after bad?
Bug free is not achievable, in no language. But good enough, secure enough, functional enough are achievable goals.
For many existing code bases (especially for projects with clearly defined scope) good enough is easier achieved by fixing up C code compared to a rewite in a different language.
Was it a different language from the beginning it might have been cheaper overall, but at some point the rewrite will just consume more money to reach the same state.
> For many existing code bases (especially for projects with clearly defined scope) good enough is easier achieved by fixing up C code compared to a rewite in a different language.
I feel like this is because our standards for "good enough" are so low, partly because languages like C and C++ make it so hard to do any better. It's true that there is a high cost to switching languages for an existing project, but the cost we are paying for all of foundational tools and libraries being built in a shaky unsafe way is also huge.
Studies I've seen suggest that ~70% of security vulnerabilities in C and C++ codebases are bugs (memory safety, thread safety, undefined behaviour) that would be caught by static checks in other languages. Think how much time and effort goes into ensuring basic safety invariants in C codebases. Then think how many of the remaining logic bugs we could catch if that effort was spent entirely on finding them.
> never going to get you to 100% reliability
Both Atheist and Muslim SREs agree: Only god is 100% reliable.
Reasoning in non-absolute magnitudes is more effortful but usually more effective.
If 2 weeks of effort spent on the single layer would get it from 95% to 95.3% reliable, then you're likely better off with another layer. If 2 weeks of effort spent on the single layer would get it from 95% to 99.9% reliable, that seems like a wise choice. However, since your mental process for judging the reliability of a single layer is probably less than 99% reliable, adding another layer helps protect against unknown errors.
> So isn't fixing bugs in existing C codebases just throwing good money after bad?
I'd agree, with this train of thought:
The author says "in C, writing reliable software is somewhere between extremely difficult and impossible." To me, that sounds like writing in C is 2%-10% reliable. If writing in rust would be 40-80% reliable, that is a powerful argument for incrementally porting something to rust if that thing will continue to need to change.
> However many bugs you uncover, you're unlikely to find them all
It is not specific to C/C++ - in any language there will be some bugs you're unlikely to find (before they are reported).
> So isn't fixing bugs in existing C codebases just throwing good money after bad? You know you're inevitably going to have to rewrite it all eventually anyway
If it took many years to write a big C codebase, then a rewrite likely will take significant time too (or you'll drop many useful features and/or introduce new bugs). And most software project take more time than anticipated, rewrites is not an exception here. Isn't is a waste of money too, just another kind?
> It is not specific to C/C++ - in any language there will be some bugs you're unlikely to find (before they are reported).
Perhaps (though frankly I doubt it). But at the very least it's very possible to avoid the "everything is a security bug by default" phenomenon of C/C++ undefined behaviour; your program may still crash occasionally, but a clean crash is relatively benign compared to security bugs or even just silent data corruption.
> If it took many years to write a big C codebase, then a rewrite likely will take significant time too (or you'll drop many useful features and/or introduce new bugs). And most software project take more time than anticipated, rewrites is not an exception here. Isn't is a waste of money too, just another kind?
If a rewrite is necessary, then the sooner you start the sooner you'll finish. I'm coming from a view that a) C/C++ code is always insecure and cannot be made secure b) as exploitation improves along with everything else, running insecure code will become increasingly untenable. You could disagree with either premise.
> If a rewrite is necessary, then the sooner you start the sooner you'll finish.
A 8-person team might easily rely on more C/C++ code than could be rewritten by them in 3 years. This jumps to 10-40 years if you include external Open Source codebases like Nginx.
So, the team needs to prioritize and to do things in steps. While they are rewriting some C/C++ code, they'll need to still maintain other code.
...and they need to solve enough business problems to keep the lights on.
> So, the team needs to prioritize and to do things in steps. While they are rewriting some C/C++ code, they'll need to still maintain other code.
Sure. But I don't think kind of portability-for-reliability work described in the article is effective work that should be prioritised. Yes, it will catch some bugs, but there's not actually much value in that.
> For the same reason, I don't find the argument for fixing bugs in C codebases at all compelling (at least if we're talking about "micro" bugs rather than overall architecture issues that might carry over to a future rewrite). Sure, there's a lot of C out there. Sure, a lot of the bugs in that C code are findable. But so what? However many bugs you uncover, you're unlikely to find them all, and even if you did then how would you know? So isn't fixing bugs in existing C codebases just throwing good money after bad? You know you're inevitably going to have to rewrite it all eventually anyway.
You're completely correct! The Linux kernel maintainers should stop bugfixing. It will be faster to rewrite it in a safer language than to add a one-line bugfix :-/
Oracle should also throw away their database code. As you say, it's faster to rewrite their 30m LoC database than to add a few lines of bugfixes.
Wait, hang on ... what about my volvo s/wagen? The various ECUs in it are all written in C! Certainly I don't want them to fix problems and roll out a patch next week. I'll wait the 5 years it takes for them to rewrite the entire system.
Then, of course, there's airliners. You're completely correct that we should not fix the problems in their code; we can shutdown air travel for the 8 years or so it takes to rewrite the system.
Gee, I wonder why everyone isn't as far-sighted and enlightened as you are.
> You're completely correct! The Linux kernel maintainers should stop bugfixing. It will be faster to rewrite it in a safer language than to add a one-line bugfix :-/
Linux actually doesn't bother fixing a lot of the kind of bugs I'm talking about (thus their use of e.g. -fno-delete-null-pointer-checks) and doesn't bother being portable between different compilers.
> Oracle should also throw away their database code. As you say, it's faster to rewrite their 30m LoC database than to add a few lines of bugfixes.
Oracle only stays alive because of aggressive sales and legal teams. H2 is both more standards-compliant and better-performing (why do you think Oracle's license doesn't let you benchmark it?).
> Wait, hang on ... what about my volvo s/wagen? The various ECUs in it are all written in C! Certainly I don't want them to fix problems and roll out a patch next week. I'll wait the 5 years it takes for them to rewrite the entire system.
> Then, of course, there's airliners. You're completely correct that we should not fix the problems in their code; we can shutdown air travel for the 8 years or so it takes to rewrite the system.
Safety-critical software is not written in C in the sense of everyday C codebases. It might be written using C syntax, but that code will not be treated as normal C: specific (often non-optimizing) compilers will be used, analysis tools will be applied, particular coding policies will be applied, using arbitrary C libraries is right out...
>> You're completely correct! The Linux kernel maintainers should stop bugfixing. It will be faster to rewrite it in a safer language than to add a one-line bugfix :-/
>Linux actually doesn't bother fixing a lot of the kind of bugs I'm talking about (thus their use of e.g. -fno-delete-null-pointer-checks) and doesn't bother being portable between different compilers. >
Correction, thy don't bother fixing some of the bugs you are talking about. A lot of the other bugs that you are talking about (bugs due to the C language) are fixed. Your proposal was to not add any fixes due to bugs in the C language.
>> Oracle should also throw away their database code. As you say, it's faster to rewrite their 30m LoC database than to add a few lines of bugfixes.
>Oracle only stays alive because of aggressive sales and legal teams.
What does that have to do with your proposal? In fact, if as you say that Oracle is alive because of non-technical reasons, then your proposal that a rewrite is better than a bugfix is even more unreasonable - they can use their lock-in to spend a decade rewriting their core products.
They aren't doing this though.
> H2 is both more standards-compliant and better-performing (why do you think Oracle's license doesn't let you benchmark it?).
How is that relevant?
>> Wait, hang on ... what about my volvo s/wagen? The various ECUs in it are all written in C! Certainly I don't want them to fix problems and roll out a patch next week. I'll wait the 5 years it takes for them to rewrite the entire system.
>> Then, of course, there's airliners. You're completely correct that we should not fix the problems in their code; we can shutdown air travel for the 8 years or so it takes to rewrite the system.
>Safety-critical software is not written in C in the sense of everyday C codebases. It might be written using C syntax, but that code will not be treated as normal C: specific (often non-optimizing) compilers will be used, analysis tools will be applied, particular coding policies will be applied, using arbitrary C libraries is right out...
I've worked as C programmer in safety-critical software. Specifically, in munitions control. The "safety" that comes is not from religious adherence to MISRA-like guidelines but from regulatory bodies who specify the process around QA, testing and official release of the software.
Changing from C to another language might help, but rewriting the entire product is out the window completely - the regulatory hurdles to re-certify and re-test alone means that such an attempt is bound to kill the rewrite effort outright.
I'm currently working in another field (C and C++ this time), and making small incremental changes is considered by the regulatory bodies as less risky than throwing everything away and restarting.
The meme "scrap C, rewrite it in $FOO" only applies to software that has very little impact on the world.
> Correction, thy don't bother fixing some of the bugs you are talking about. A lot of the other bugs that you are talking about (bugs due to the C language) are fixed. Your proposal was to not add any fixes due to bugs in the C language.
They don't bother fixing large categories of C bugs, to the extent that Linux can't really be said to be a C program - rather it's a program written in an ad-hoc GCC-specific dialect. Not fixing bugs due to the C language (or rather, things that would be bugs if it was interpreted as a C program) is very much part of that.
> What does that have to do with your proposal?
I thought you were holding up Oracle as some paragon of technical excellence to be emulated, which it isn't.
> In fact, if as you say that Oracle is alive because of non-technical reasons, then your proposal that a rewrite is better than a bugfix is even more unreasonable - they can use their lock-in to spend a decade rewriting their core products.
I don't know what you're suggesting or advocating here. I'm not a businessperson and don't have any idea what's the most effective way for Oracle to make money. I do know from personal experience that if you want to make a good product, switching from Oracle to something written in a safer language can help.
> I'm currently working in another field (C and C++ this time), and making small incremental changes is considered by the regulatory bodies as less risky than throwing everything away and restarting.
Regulations are, sadly, often a long way behind what's actually effective.
> The meme "scrap C, rewrite it in $FOO" only applies to software that has very little impact on the world.
On the contrary, the kind of software that actually changes the world tends to be written in non-C. Where C tends to be used is in the kind of software that's a marginal replacement for non-software.
You say that C / C++ is bad. You assume that there are alternatives that are just better. I think you are wrong.
Why? The short answer is: there is no free lunch. There is no silver bullet.
C / C++ has huge pitfalls with its memory management. But it is also a well-designed programming language with a great balance between efficiency and comprehensibility (and all other features of a language).
Every design decision in a language is a trade-off. And the trade-offs in C / C++ are so well chosen that there was no easy improvement for decades. No low hanging fruit. A C programmer has to understand much more about memory management because there is no virtual machine / garbage collector / interpreter running which takes care of it. But for that you get efficiency that is not possible by design with the other approaches.
For me C / C++ made the best trade-off.
Until Rust came along. Rust wants to make the memory management 10% better than C++ with 0% loss in efficiency. That is huge. No one reached something similar in 40 years. I am sceptical because this is such an ambitious undertaking. If that really works in real life I will be impressed.
> C / C++ has huge pitfalls with its memory management. But it is also a well-designed programming language with a great balance between efficiency and comprehensibility (and all other features of a language).
Citation needed. I don't believe it's anywhere close to the Pareto frontier of language design, nor do I think we should reasonably expect it to be - it's fundamentally a language put together by two kids with no formal training, designed for programming a machine that had 256 kilobytes of RAM. It became popular due to some accidents of history (Unix, availability of free compilers) and the following network effects far more than the technical merits of the design.
> Every design decision in a language is a trade-off. And the trade-offs in C / C++ are so well chosen that there was no easy improvement for decades. No low hanging fruit. A C programmer has to understand much more about memory management because there is no virtual machine / garbage collector / interpreter running which takes care of it. But for that you get efficiency that is not possible by design with the other approaches.
That's nonsense IMO. ML was and remains an overwhelmingly better language, the performance of an incorrect program should be considered meaningless, and for all that C is supposedly more efficient, when I've actually seen a C program rewritten in an ML-family language the result has been substantially better performance.
Eric Raymond, not Linus Torvalds, said the "given enough eyeballs" quote. Unfortunately there aren't enough eyeballs, so bugs linger for years anyway.
I don't know, lately I've been thinking maybe there are plenty of eyeballs and there's just too much code. Do things really need to be as big and complicated as they are?
"Move fast and break things" induces bad code. Pascal's quip that "I have made this longer than usual because I have not had time to make it shorter" applies to code too; it takes time and effort to make code as concise as possible.
Portability is complexity. Complexity is diametrically opposed to reliability.
A big part of the author's argument is 'portability is simplicity'. That's the reason why it produces more reliable code.
What is meant by that? I understand it like: don't write platform-specifc hacks. These rely on assumptions about behavior definitions for in general non-defined behavior.
An example: sometimes you can read meaningful data from memory after a free() because no one overwrote it. On your platform in your situation this can work.
It is clear for everyone that this is a bad idea. It would be great to have a low cost tool that tells you about such a bug. One class of tools for this are sanitizers. The author proposes another additional class of tools: testing your program with a run on another platform.
And why this could lead to more simplicity in the source code? A use-after-free bug comes from too much complexity. The programmer couldn't keep all the complexity in his or her mind. I bet in general well-written solutions without platform-specific assumptions look more simple and are better to understand for a human.
Clearer code because of more found bugs because of an extra bug-finding tool (cross-platform testing).
Aiming for portability doesn't always mean increasing complexity.
If it takes the form of #ifdef WIN32 directives then yes, you're increasing complexity in the name of portability, but if instead you're making an effort to avoid using platform-specific non-standard features, then that doesn't increase complexity, other than requiring developer discipline.
Nooo node/electron should be enough for everyone.
Before node/electron, that was msvc.
There is also the occasional article stating that people with multiplatform development experience are simply better at debugging their code.
Don't let s390x and msvc drink your milkshake.