CVE-2024-6409: OpenSSH: Possible remote code execution in privsep child
openwall.comNo vulnerability name, no website, concise description, neutral tone, precise list of affected distros (RHEL + derivatives and some EOL Fedoras) and even mention of unaffected distros (current Fedoras), plain admission that no attempt was made to exploit. What a breath of fresh air!
(I am only joking of course. As a recovering academic, I understand that researchers need recognition, and I have no right to throw stones -- glass houses and all. Also, this one is really like regreSSHion's little sibling. Still, easily finding the information I needed made me happy.)
I don't think recognition for researchers is the big win for named vulnerabilities. In the places that matter, they can just describe their findings in a short sentence and get all the recognition that matters. The names are mostly for the benefit of users.
Security researchers definitely do the naming gimmick for personal brand purposes. This may not be as obvious when it’s successful, but academic papers routinely name vulnerabilities when there is no real benefit to users.
The whole point of naming vulnerabilities is to establish a vernacular about them, so it's not surprising that academic papers name them. The literature about hardware microarchitectural attacks, for instance, would be fucking inscrutable (even more than it is now) without the names.
I'd be happy to file all of them under Spectre/MDS, except for the ones that aren't Spectre/MDS, of course. They don't all need unique names. Most of them are all instances of the same pattern: some value is not present in a register when it's needed, and an Intel CPU design continues to execute speculatively with the previous contents of that register instead of inserting a pipeline bubble, leaking the previous contents of that register. Using an inter-core communication buffer, instead of a load data buffer like the last person, I don't think deserves a new name and logo. A new write-up, yes.
Wikipedia puts them all under one page: https://en.wikipedia.org/wiki/Transient_execution_CPU_vulner...
I don't even understand the impulse to lose the names. Names aren't achievement awards. We already have Best Paper awards at the Big 4 and the Pwnies (for however seriously you take that). The names don't cost anybody anything, and they're occasionally helpful.
Name them all.
You see the same weird discussions about CVEs, and people wanting to squash CVEs down (or not issue them at all) because the research work is deemed insufficient to merit the recognition. As if recognition for work was ever even ostensibly what the CVE program was about.
The author of the mail is Solar Designer, a bit of a legend AFAIC. He has no need to pump up his brand and he really really knows what he's doing.
Yeah. He created openwall and the oss-security list.
At least they do not name them after themselves.
For clarification, the bug is in a patch applied by red hat, not in openssh itself.
Technically the bug is in upstream code, but it is latent without the Red Hat patch:
> cleanup_exit() was not meant to be called from a signal handler [...] Fedora 38+ has moved to newer upstream OpenSSH that doesn't have the problematic cleanup_exit() call.
> This extra problematic logic only existed in upstream OpenSSH(-portable) for ~9 months
The fix also doesn't touch the Red Hat-specific code:
They suggest applying it even on non Red Hat distros.diff -urp openssh-8.7p1-38.el9_4.1-tree.orig/sshd.c openssh-8.7p1-38.el9_4.1-tree/sshd.c --- openssh-8.7p1-38.el9_4.1-tree.orig/sshd.c 2024-07-08 03:42:51.431994307 +0200 +++ openssh-8.7p1-38.el9_4.1-tree/sshd.c 2024-07-08 03:48:13.860316451 +0200 @@ -384,7 +384,7 @@ grace_alarm_handler(int sig) /* Log error and exit. */ if (use_privsep && pmonitor != NULL && pmonitor->m_pid <= 0) - cleanup_exit(255); /* don't log in privsep child */ + _exit(1); /* don't log in privsep child */ else { sigdie("Timeout before authentication for %s port %d", ssh_remote_ipaddr(the_active_state),Sort of. The upstream bug isn't thought to be exploitable alone.
Couldn't this entire class of bug be solved by annotating signal handlers in the source code and checking at compile time that anything called from a signal handler is async-signal-safe?
There are some static analysis tools that can check this.
Cert's SIG30 rule page has a list: https://wiki.sei.cmu.edu/confluence/display/c/SIG30-C.+Call+...
Also there's https://clang.llvm.org/extra/clang-tidy/checks/bugprone/sign...
Sounds reasonable, but since the language layer has no knowledge of signal handlers or what that means, it would be a separation of concerns problem. I'm sure you could get clang to do it, but still a tricky thing to design around.
Ultimately it's an example of an invariant where it's clear that programmers can't be trusted to uphold it. In this case, the consequences can be very significant.
> the language layer has no knowledge of signal handlers or what that means
Despite the fact that there is explicit runtime support for signal handlers in the language runtime (i.e. libc).
Libc isn't the language runtime. The runtime is '/use/lib/crt*.o', which has no concept at all of signal handling.
Libc isn't particularly intrinsic to the language, and outside of some assembly to make syscalls, you can implement an alternative with a completely different interface, purely in C.
The language standard library does, however, contain explicit support for signal handling, as specified in ISO/IEC 1989:2023 section 7.14 Signal handling <signal.h>. The cross-platform bits of libc are specified in the C standard. The POSIX-specific bits are specified by the Open Group in the POSIX standard. The OS-specific bits are specified by the OS and implemented by whoever is writing the libc in question. A libc is a sort of statically linked combination of the C standard library and some OS-specific standard library extensions.
I am fairly certain that glibc uses SA_RESTORER in its sigaction wrapper and implements a suitable sigreturn() function which is provided as the sa_restorer argument.
Sure, as can any library.
Yep, such is C.
Static analysis tools would go a long way here, yes, and it should be a relatively straightforward analysis. You probably don't even need to explicitly annotate signal handlers, just examine arguments to calls to signal() and sigaction().
This entire class of bug could also be solved by avoiding signal handlers. You can still use SIGALRM for a timeout, but don't log it. If you need complex processing, use signalfd to read signals in the event loop.
Well, the compiler has no way of knowing if a function will later be a signal handler after linking, or even dynamic loading.
There is no portable way to annotate all functions ever written or ever will be written as being async signal safe.
Which functions are async signal safe varies with the operating system and runtime (eg. an unsafe function in linux-gnu might be safe in linux-musl or linux-bionic).
Other than those insurmountable problems, yeah, good idea.
What I want when writing a signal handler is to be able to say, this function must be async-signal-safe and therefore all the functions it calls must be async-signal-safe. That can be done purely at compile time; I don’t need to worry about linking.
The annotation does not need to be portable; if it’s present on one system then other systems still benefit because the code is written to pass the check.
The list of async-signal-safe functions is well documented and quite short, so it would not be much work to add the annotations to the header files. It’s OK if some safe functions are omitted, because signal handlers should be written to do the absolute bare minimum.
No, any function can call another function from another translation unit (at link time) or load and call a function from another translation unit (at runtime). How will the compiler enforce the propagation of the requirement in those cases?
> compiler has no way of knowing if a function will later be a signal handler after linking, or even dynamic loading
You could check it at runtime.
Just like with array bounds checking, in many cases the compiler could sometimes prove the runtime check isn't necessary and eliminate it.
> Which functions are async signal safe varies with the operating system and runtime
Annotations could enumerate specific platforms where it is safe or unsafe. Or you could annotate based on specific attributes of platforms that make it safe or unsafe.
> Well, the compiler has no way of knowing if a function will later be a signal handler after linking, or even dynamic loading.
That's why GP suggested annotating them. Typically this would be done via a function attribute.
> There is no portable way to annotate all functions ever written or ever will be written as being async signal safe.
This is not a requirement for such an annotation to exist and to be used by projects that care about security or even just correctness.
> Which functions are async signal safe varies with the operating system and runtime (eg. an unsafe function in linux-gnu might be safe in linux-musl or linux-bionic).
And libc implementations already annotate many of their functions to tell the compiler how they work. Compilers are also more than happy to assume behavior of standard function matches the C/C++ standards in non-freestanding environmnets.
> Other than those insurmountable problems, yeah, good idea.
All fairly trivial problems that have already been solved many times for similar issues.
I'd like a more general attribute though to declare that a particular funcion is in some abstract domain and then annotations that certain functions may or may not be called in certain domains. This could come useful in cases where you want some functions to only be called from special threads.
> That's why GP suggested annotating them. Typically this would be done via a function attribute.
That won't help when you link external functions or worse, dynamically load them. Those are things done long after the compiler has run.
> And libc implementations already annotate many of their functions to tell the compiler how they work. Compilers are also more than happy to assume behavior of standard function matches the C/C++ standards in non-freestanding environmnets.
We're not talking about standard functions here, we're talking about any function any developer could ever call in a signal context. Ever. Like, for example, a libssh shutdown function that invokes a callback that calls a syslog function that does some socket operation on a buffer that some other thread has already freed. Which of those functions needs the annotation, and how does dlsym() deal with it?
Your reply is akin to saying that static analyzers are pointless because of the halting problem.
This is why I've always disliked Debian and Red Hat.
1. I hate the fact they have the hubris to think they can be smarter than the upstream developers and patch old versions
2. I hate the fact they don't ship vanilla packages, but instead insist on patching things for features that nobody relies on anyway, __because they're not upstream__.
Maintainers should stick to downloading tarballs, building them and updating them promptly when a new version is out. If there's no LTS available, pay upstream and get an LTS, don't take a random version and patch it forever just to keep the same version numbers, it's nonsensical and it was only a matter of time before people tried to exploit it. Just look at the XZ backdoor for instance, which relied on RedHat and Debian deploying a patched libsystemd.
Enterprises don't go for RHEL because it's free software, yay freedom!
They go for it because it gives a very stable, solid foundation. They don't want a fragile base layer prone to breaking every day of the week.
This involves backporting a lot of stuff (primarily security fixes) because you can't just upgrade any package to its latest version, it will have entirely new dependencies, potentially breaking changes etc.
What should RedHat do, which does not:
1) make them lose their enterprise customers wanting a stable base
2) have unpatched security holes all over their distros
3) not cause them to backport stuff (we are here at the moment) ?
Compagnies I've worked for that uses redhat do so because they think paying will prevent them from working. As if, sudenly, running a nearly 10y-old code was no longer stupid, because you paid for it.
> As if, suddenly, running a nearly 10y-old code was no longer stupid, because you paid for it.
I love this
Of course it is stupid, the question is whether you have an alternative that isn't stupid and respects all the regulation/certification requirements.
I wish the places I worked made such principled decisions.
Some places did, but not most. In my 2 decades of contracting I have seen plenty of shops with a real fear of upgrading and no plan for modernizing. They are trapped in decades old tech and prefer it that way for no discernible reason. Worse, they often have no recovery plan if there is a problem. There is a huge amount of maintaining the status quo and trying not to make waves.
For some of these projects a team of like dozen devs could recreate the core product in some new tech in less than a year with the right institutional knowledge. But they don't for a myriad of excuses and reasons.
To be fair, if they couldn't make the decision to use a non-paid distro, would you trust them to be able to manage the updates, the ABI compatibility breakages, and so on once every year or two?
Of course not
But again, paying for the distribution does not free you from the sysadmin duties
Even worse: because of the long "supported" duration, the common mindset is "fire and forget". After all, why would we care ? That stuff will be "supported" for 10 years, we'll be long gone by then. And when you have high turn-over rate, you hit the champion's title : every thing is legacy, nothing is managed, every thing is crap
You give so many good reasons to use RHEL. Less noble, but still good :)
Said compagnies have no regulations nor certifications requirements.
I work in the kernel maintenance group, the "10 year old code" is having select important and critical flaws applied.
If you think about it, all maintained projects of old code has this same mechanism, just has more frequent updates.
How does the age metric now work once you know this ?
I understand the business logic behind that. The point is, maybe they should consider paying the upstream developers to backport the stuff themselves instead of dabbling with C code they somewhat understand?
C isn't magic, plenty of people understand it and lots of these projects move quite slow. That these things CVEs on ssh are so rare shows how well this process normally works. These past couple of weeks have had 3(?) ssh vulnerabilities? We often go years with one, and not all are a result of packaging some come from upstream.
Any new process needs to not just fix this problem, but also all or at least most of the problems that the existing processes fixes.
Tracing thousands of developers across the planet and drafting contracts in hundreds of jurisdictions, including some where you don't even have a branch office or any kind of legal presence? Ugh. And what if one's from an embargoed country? What if a primadonna asks for a million a day, or half of them don't deliver in time, go on holiday, win the lottery, fall in love, get into a dispute, refuse to work with you, don't have access to all the architectures for comprehensive testing, lose interest, change employer (and can't work with you anymore), or sell to some dodgy entity preparing the next sBoM attack.
....better to pay your own people. Hire them if they're available, sure, otherwise task an engineer with this.
Distributions patch to get consistent and integrated behaviour across the platform, including new features that require implementation into multiple places to be even minimally useful, and ports to newer APIs to make everything work against a single version of each library so that they do not conflict.
Upstreams generally don't do this until prompted, and sometimes resist until the path is proven and becomes best practice because distributions pushed it.
This process is mostly invisible to you because distributions have been successful at getting the changes needed for sane and consistent behaviour embedded into tools and expectations by default. All you see are the patches that are in flight or didn't make it.
If you want a distribution that does only minimal patching then there are distributions for that. The fact that the major distributions do patch speaks volumes about which approach results in a better experience for users.
Do you think there is a reason that some distros go through all of this additional trouble?
The risk you take when you use a distribution that modifies upstream. Debian has had similar issues in the past (maybe not CVEs, but certainly packager-created bugs).
It's risks all the way down. There are risks to not patching upstream as well.
Debian has a fairly famous one: CVE-2008-0166
Ouch, that one's bad: https://github.com/g0tmi1k/debian-ssh#the-bug
>These lines were removed because they caused the Valgrind and Purify tools to produce warnings about the use of uninitialized data in any code that was linked to OpenSSL. Removing this code has the side effect of crippling the seeding process for the OpenSSL PRNG. Instead of mixing in random data for the initial seed, the only "random" value that was used was the current process ID. On the Linux platform, the default maximum process ID is 32,768, resulting in a very small number of seed values being used for all PRNG operations.
In that particular case upstream _was_ consulted and had acked the patch.
Upstream was consulted for a similar change in another location, where the code was actually unnecessary.
My understanding here is that it only impacts Redhat (and maybe derivatives)?
Yes, only RHEL 9 (the current version of RHEL) and its upstreams/downstreams (CentOS Stream 9, Rocky Linux 9, Alma Linux 9,...).
Also affected: Fedora 37, 36 and possibly 35, which are all end-of-life (since December 2023 in the case of Fedora 37).
Not affected: Fedora 38 (also EOL), 39 (maintained) and 40 (current).
Is this in any way related to CVE-2024-6387 "RegreSSHion" discussed last week?
https://news.ycombinator.com/item?id=40843778
Edit: Ok it seems very closely related; I was just surprised no one had linked the previous discussion.
It's almost as if you should understand security critical C code before you start patching it to death.