Simplicity awakes
jehanne.ioCorrect me if I'm wrong, but doesn't this suffer from a race condition? What if a process requests an awake, but then gets preempted before doing the blocking system call, and then gets awakened in response to its awake request (rather than because of normal scheduling)? Its awake request was already serviced, so if it performs a blocking syscall, it will wait indefinitely.
Alternatively, if the awake request is only done when a blocking syscall is done, doesn't it then suffer from the problem that a random buggy library function could request an awake without then doing a blocking syscall (due to whatever logic bug), so then when the process does a blocking syscall that it expects to block indefinitely, it instead gets a syscall with a timeout?
Wouldn't it be better for the awake syscall to take another syscall as a parameter (pretty simple to do in assembly and should be provided as a C library wrapper), in order to guarantee atomicity?
> Wouldn't it be better for the awake syscall to take another syscall as a parameter (pretty simple to do in assembly and should be provided as a C library wrapper), in order to guarantee atomicity?
Plus in this case the awake call could be named something more intuitive (like syscall_with_timeout or whatever).
> Plus in this case the awake call could be named something more intuitive
This is an interesting objection.
I find awake/awakened/forgivewkp intuitive names, but I'm not a native English speaker.
I'm not going to add the syscall parameter (I considered and discarded that option during the analysis), but I welcome suggestions for a better naming.
Awake in itself is intuitive in some contexts, but it doesn't seem to describe the semantics you want in this case. First of all it's not obvious that it's related to syscalls. Secondly it doesn't really mean the process is guaranteed to awake after the specified time - if the syscall doesn't block or finishes faster, the process might well stay sleeping at the alleged awaking time. Someone who doesn't know all the details will easily get the wrong idea.
Jehanne hacker here.
> doesn't this suffer from a race condition?
This is a good question I should probably clarify in the article as it has been asked before but I can't answer in that forum (see https://lobste.rs/s/fqilcv/simplicity_awakes#c_8pvo0s).
To prevent race conditions the wakeup can occur only during a blocking system call (not even all, some cannot be interrupted to avoid unintuitive side effects).
> it then suffer from the problem that a random buggy library function could request an awake without then doing a blocking syscall (due to whatever logic bug), so then when the process does a blocking syscall that it expects to block indefinitely, it instead gets a syscall with a timeout?
This is by design.
The awake idiom described in the article is pretty simple: if you book a time slice you must release it if it didn't expire.
The operating system cannot prevent userspace bugs.
> Wouldn't it be better for the awake syscall to take another syscall as a parameter
This is an option I discarded during the analysis.
It's a matter of trade offs: an additional argument would increase the complexity a lot. In particular, you would need to maintain a map of syscall->wakeups in userspace if you want to be able to `forgivewkp` the right one. And, on successful completion of a sequence of syscalls, you would have to `forgivewkp` all unexpired wakeup in such map.
Thus a single addictional parameter would largely increase the complexity both of the kernel implementation and of the user space code, making several bugs harder to reproduce.
"The design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design." - http://jehanne.io
I've been looking for examples of good comments about simplicity to help focus my own view of simple. I love the quote above.
The quote is, of course, originally from Richard P. Gabriel's 1991 essay (linked from this blog post). It's the first characteristic of "worse-is-better".
Note that in the text the quote it's from, this approach is considered "worse" (but viral).
The author of the text values correctness and ease of use over simplicity of implementation and interface.
Your customers don't care in the least how simple your system's implementation is. They want it to be reliable and easy to use.
"In a way the so called “New Jersey style” was a rush for a minimum viable product able to minimize the time-to-market and to gain the first mover advantage."
Another way of looking at it is that it is an attempt to get something into existence, rather than waiting years for perfection and ending up with nothing. To paraphrase performance work, something is infinitely better than nothing.
I am somewhat interested in the comment, "(the mindful reader will notice that alarm is still waiting to be moved to user space… the fact is that it’s too boring of a task!)" I originally interpreted it as meaning userspace could interrupt the blocking call, but I see its actually a filesystem that interacts with kernel space.
fd = create("/dev/alarms/new", ~0, pair_ints(getpid(), ms))
Been a long time since I have looked at Plan 9; create has some interesting arguments.> Another way of looking at it is that it is an attempt to get something into existence, rather than waiting years for perfection and ending up with nothing.
Much as I liked Multics, this is what happened to it.
> [asymptotic graph trending towards doom]
> How many programs are you running right now? :-D
I don't even know, there are probably thousands of processes running right now, totaling hundreds of millions of lines of code.
And it all works perfectly fine, especially as long as I don't update anything. I just don't do the things that don't work. The things that don't work, they generally don't work 100% of the time. The things that do work, they generally work 100% of the time. Some software might fail randomly and frequently, in which case I might not use it either, unless failure is easily recovered from (which is often the case).
I don't need a system that is really simple and (as a consequence) super-reliable. I need a system that runs my software and that is fault-tolerant. After all, even entirely correct software cannot prevent hardware faults (which do occur).
> The things that don't work, they generally don't work 100% of the time. The things that do work, they generally work 100% of the time.
If you exclude the guns that kill, guns are safe.
If you exclude all security vulnerabilities of the last decade, all mainstream software is secure.
> I don't need a system that is really simple and (as a consequence) super-reliable.
I think you are overlooking how pervasive is computing in your life.
But I can see how a user that have no programming experience could refuse to accept the sad state of today computing.
> After all, even entirely correct software cannot prevent hardware faults (which do occur).
You are misreading the intent here: as artifacts built from fallible humans, no software can be perfect.
But if you don't even try to keep complexity low, it will soon become unmanageable and expensive.
Still, as Gabriel said in his essays, you are right that users can be manipulated to accept and even pay for crap.
It's called marketing.
But I don't like it.
> If you exclude the guns that kill, guns are safe.
There are indeed guns which are ridiculously unsafe to use and if you just count all guns in the world and average their failure rates, then "on average" guns are less safe. The kind of gun you can legally buy, properly handled, is quite safe - as far as guns go anyway.
The point I am making is that if you just average stuff out (like with the graph) it does not reflect reality. The computers systems that work in reality have very high reliability. Those that don't work > 99% of the time are simply not deployed.
> If you exclude all security vulnerabilities of the last decade, all mainstream software is secure.
All mainstream software is "secure enough", just like all mainstream software is "reliable enough". Otherwise, we obviously couldn't use mainstream software, we would all be forced to use provably correct software that is far more expensive to develop. In practice, the biggest security problem sits at the other end of the screen and no piece of software can fix it.
> I think you are overlooking how pervasive is computing in your life.
> But I can see how a user that have no programming experience could refuse to accept the sad state of today computing.
Believe it or not, I'm an experienced programmer and that has taught me pragmatism, above all things. I could complain about the state of computing all day, but the reality is that it works. It really does. You just have to admit that. Could it be better in practice? Maybe, maybe not. There's only so much effort in the world that can be spent on improving software and actually deploying it (which is the difficult part when comes to new software).
> You are misreading the intent here: as artifacts built from fallible humans, no software can be perfect.
> But if you don't even try to keep complexity low, it will soon become unmanageable and expensive.
I'm not arguing against that, I'm arguing against what that particular graph insinuates. The idea that nothing works anymore when the sum of all unreliable parts creates a completely unreliable result. That doesn't happen in practice with the actual operating systems (and other systems) that we use.
Keeping things simple is of course desirable, but it's also not easy at all and it requires a great level of skill and care. We don't have that kind of skill to work with, at least not for the vast majority of software out there.
> Still, as Gabriel said in his essays, you are right that users can be manipulated to accept and even pay for crap.
> It's called marketing.
That's just naive. It's not like users always have a choice between expertly crafted high quality software and crap software, but then they choose crap because of marketing. They have a choice between Microsoft Office and LibreOffice, both of which are crap. They pay for Microsoft Office because it works better with what everyone already uses (Microsoft Office) or they choose LibreOffice to save money. That's just one example, but there are countless others.
> I'm arguing against what that particular graph insinuates. The idea that nothing works anymore when the sum of all unreliable parts creates a completely unreliable result.
No.
That graph shows the probability of the whole system working correctly (aka as the user expect) if each component is 99% correct.
I confirm this.
But I cannot say how severe is the bug you will face. I never said "nothing work anymore".
> That doesn't happen in practice with the actual operating systems (and other systems) that we use.
You overlook the failures.
Each big or small failure count in that graph.
> They have a choice between Microsoft Office and LibreOffice, both of which are crap.
Sorry I explained me badly.
Gabriel says this in a more diplomatic way: "users have already been conditioned to accept worse than the right thing".
"Conditioned" aka manipulated aka marketing.
I meant that by only proposing crap against crap you promote crap.
> That graph shows the probability of the whole system working correctly (aka as the user expect) if each component is 99% correct.
Yes, it shows that. But what's the point of showing it? It insinuates is that there is a problem here. There isn't. Real systems have 99%+ uptime, or they aren't deployed. With the software that we actually use, we're to the far left of that graph, not anywhere near the right.
> You overlook the failures. Each big or small failure count in that graph.
Eh, not really. It's not statistics based on real data, it's a hypothesis. No real-world failure shows up in it. Again, there are a hundreds if not thousands of processes running on your average Linux box, but failure rates are astonishingly low. Yet, Linux is the total opposite of "the right thing".
I don't see you arguing with that, because you can't argue with it. It's the facts! Not doing "the right thing" works. Doing "the right thing" generally doesn't, because that software never ships on time. All the beautiful operating systems dreamed up inside of ivory towers never took the market. It's not because of "marketing" or "conditioning", but because that software is not actually better for the end user. It lacks features, it's more expensive, it's late. It then doesn't matter if it's simple.
> But what's the point of showing it?
To reason about reliability and its impact on costs.
> Real systems have 99%+ uptime, or they aren't deployed.
Uptime is not correctness.
> failure rates are astonishingly low [...] > I don't see you arguing with that, because you can't argue with it. It's the facts!
No, it's your perception.
These are facts:
- https://www.debian.org/Bugs/
- https://bugzilla.redhat.com/query.cgi
- https://bugzilla.kernel.org/describecomponents.cgi
- https://bugzilla.gnome.org/query.cgi
- https://bugs.kde.org/describecomponents.cgi
- https://bugzilla.mozilla.org/describecomponents.cgi
- https://bugs.chromium.org/p/chromium/issues/list
do a search in any of these issue tracker and you will be overwhelmed with facts.
Now, I agree that, with huge efforts and costs, over decades many developers and companies managed to go beyond the 99% correctness on some projects.
But with simpler systems and designs, the cost of reaching such level of quality (that most of software do not even aim to reach) would be a tiny fraction.
> Not doing "the right thing" works. Doing "the right thing" generally doesn't
I wonder if you read the article at all.
I proposed a third style: simplex sigillum veri.