Speculative execution, variant 4: speculative store bypass
bugs.chromium.orgIf you are using Linux-based virtualization (KVM), besides requiring updated kernel and Intel microcode (which is not yet available), you would also need updates for relevant layers: QEMU and libvirt. Patches are posted[1][2].
Virtual Machines now need to be exposed a new Intel CPU feature flag: 'ssbd' (Speculative Store Bypass Disable).
On microcode, from Red Hat's blog post[3]:
In many (but not all) cases, full mitigation will also require updated microcode from the system microprocessor vendor. Red Hat intends to ship updated microcode as a convenience to our customers as it is made available to us. In the interim, customers are strongly advised to contact their OEM, ODM, or system manufacturer to receive this via a system BIOS update.
[1] https://www.redhat.com/archives/libvir-list/2018-May/msg0156...
[2] https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg047...
[3] https://www.redhat.com/en/blog/speculative-store-bypass-expl...
AMD guidance:
https://developer.amd.com/wp-content/resources/124441_AMD64_...
(setting an CPU-specific MSR and it's done for current CPUs, no microcode updates required.)
https://www.amd.com/en/corporate/security-updates has : "We have not identified any AMD x86 products susceptible to the Variant 3a vulnerability in our analysis to-date."
For AMD, in context of virtualization — you would need to also expose a new CPUID flag: 'virt-ssdb', which all hypervisor vendors will expose to guests on AMD hosts. More from the libvirt patch[1]:
Some AMD processors only support a non-architectural means of enabling Speculative Store Bypass Disable. To allow simplified handling in virtual environments, hypervisors will expose an architectural definition through CPUID bit 0x80000008_EBX[25]. This needs to be exposed to guest OS running on AMD x86 hosts to allow them to protect against CVE-2018-3639.
Note that since this CPUID bit won't be present in the host CPUID results on physical hosts, it will not be enabled automatically in guests configured with "host-model" CPU unless using QEMU version >= 2.9.0. Thus for older versions of QEMU, this feature must be manually enabled using policy=force. Guests using the "host-passthrough" CPU mode do not need special handling.
[1] https://www.redhat.com/archives/libvir-list/2018-May/msg0156...
I like that it's specex variant 4 and spectre variant 3. Keeps everybody sharp.
Special register read is called "variant 3a" because it allows you to break the privilege level separation like Meltdown and, back in November, Meltdown was called "variant 3". Variant 1 was conditional-branch Spectre (speculative out of bounds accesses) while variant 2 was indirect-branch Spectre (the one that could be used to read host memory from a virtual machine).
These are the links I found most explanatory
https://bugs.chromium.org/p/project-zero/issues/detail?id=15...
https://software.intel.com/sites/default/files/managed/b9/f9...
https://software.intel.com/sites/default/files/managed/c5/63...
https://blogs.technet.microsoft.com/srd/2018/05/21/analysis-...
https://developer.amd.com/wp-content/resources/124441_AMD64_...
https://www.intel.com/content/www/us/en/security-center/advi... uCode update is only for variant 3a (MSR read) and for the global disable bit in the MSR. The standard mitigation is still LFENCE.
https://docs.microsoft.com/en-us/cpp/security/developer-guid... vulnerable code examples
This is actually less clear (at least for me) than the project zero post.
Can you share the link? I found the redhat article clearer than the current chromium FP post :)
---
edit: I wasn't able to find anything new since Jan 3rd about the speculative bypass from Project Zero.
Some additional articles about the newly revealed Variant 4:
https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/Variant4
https://xenbits.xen.org/xsa/advisory-263.html
https://www.cnet.com/news/intel-microsoft-reveal-new-variant...
https://newsroom.intel.com/editorials/addressing-new-researc...
>Can you share the link? I found the redhat article clearer than the current chromium FP post :)
I was talking about https://bugs.chromium.org/p/project-zero/issues/detail?id=15...
A commenter over at arstechnica (https://arstechnica.com/gadgets/2018/05/new-speculative-exec...) found an old article explaining the optimization which led to this vulnerability: "Faster Load Times - Intel Core versus AMD's K8 architecture" https://www.anandtech.com/show/1998/5
Possibly completely unrelated question (this stuff is firmly over my head): toward the end of the first PoC there's
/* if we don't break the loop after some time when it doesn't
work, in NO_INTERRUPTS mode with SMP disabled, the machine will lock
up */
The bit at the top of the that says ======== Demo code (no privilege boundaries crossed) ========
is suggestive and unambiguous, but the program executions show (with "$"s) that this is being executed as non-root.So... is this deadlock fundamentally related to the speculative execution glitch(es)?
That makes me curious; I wonder what on a modern PC relies on periodic interrupts, and would lock up the machine if they didn't occur. I know on the original PC and XT, DRAM refresh relied on interrupts but that stopped being the case with the AT:
https://www.reenigne.org/blog/how-to-get-away-with-disabling...
Isn't preemption still interupt based. Without interrupts, there would be nothing to cause the CPU to stop executing the demo code. With SMP disabled, this means that nothing else will get a chance to run until the demo code yields itself (or re-enables interupts).
You wouldn't be able to disable interrupts as non-root. The iopl syscall allows the PoC to use CLI to disable interrupts. See the "sudo" in the NO_INTERRUPTS runs:
I would guess the deadlock is due to a hardware watchdog timer rebooting the system, or some other hardware function that needs to be tended to periodically before it hangs.$ gcc -o test test.c -Wall -DHIT_THRESHOLD=50 -DNO_INTERRUPTS $ sudo ./testIt doesn't look like a deadlock; turning off interrupts prevents the preemptive scheduler from running. Without a timer interrupt, the only way the scheduler would run is if it's invoked to put the process to sleep during a blocking syscall or explicitly with sched_yield(2), pthread_yield(3), etc.
If interrupts are off, the the PoC program might wait forever for "hits > 32" if never testfun() never detects a "hit". Giving up after 1M bust loops ("cycles < 1000000") should prevent this from happening... but... I wonder...
Without -O0, some optimizations are still enabled. Could a modern "clever" optimizing compiler assume that the speculative "hit" never happens and therefor conclude that "cycles" is only used after the loop when it is "guaranteed" to have the value 1000000 and "optimize" the loop into something likegcc -o test test.c -Wall -DHIT_THRESHOLD=50 -DNO_INTERRUPTS
and the sprintf() into something like:/*long cycles = 0;*/ //DEAD while (hits < 32 /*&& cycles < 1000000*/) { //DEAD // ... rest of loop body, maybe? /* cycles++; */ //DEAD pipeline_flush(); /*}*/ //DEAD
I'm probably worrying about nothing. Or at lest I should be worrying about nothing, but with the current trend of "clever" optimizers exploiting everything they think is provable, I'm no longer certain. blehsprintf(out_, "%c: %s in 100000 cycles (hitrate: %f%%)\n", secret_read_area[idx], results, 100*hits/(double)(100000));The pipeline_flush() asm block has a "memory" clobber which will certainly prevent this kind of optimisation.
Woops! Completely missed that, heh.
There go my plans for non-root system lockup :(
Locking up the machine is a possibility when you disable interrupts. Disabling interrupts with the cli instruction needs the IO privilege level (IOPL) to be at least as high as the protection ring the code is running in. Linux runs userspace code in ring 3, so the IOPL has to be set to 3 with the iopl call first. This requires the CAP_SYS_RAWIO capability, which allows you to do pretty much anything already.
The MS advisory: https://portal.msrc.microsoft.com/en-US/security-guidance/ad...