Xen hypervisor memory corruption due to x86 emulator flaw

xenbits.xen.org

125 points by fwilhelm 11 years ago · 38 comments

Reader

This bug's existence and its patch have already made some worried. From the Qubes OS developers: [1]

    Additional thoughts by Qubes Security Team
    ===========================================
    
    We see several problems that concern us about this vulnerability and
    patching process:
    
    1) It seems really difficult to understand why would anybody design a
    structure like the one shown above, which uses a union to store two,
    RADICALLY DIFFERENTLY TRUSTED data: an internal pointer into
    hypervisor memory and VM-provided UNTRUSTED DATA? Such design decision
    made by one of the core hypervisor developer is certainly worrying.
    We're not sure if it would be more worrying if this was done purposely
    vs by carelessness...
    
    2) We are not entirely convinced if the way Xen Security Team decided
    to address this vulnerability is really optimal, security wise. It
    seems like a more defensive approach would be to get rid of this
    dangerous construct of reusing the same memory for both an internal
    pointer and VM-provided data. Apparently Xen developers believe that
    they can fully understand the code, with all its execution paths, for
    decoding x86 operands. This optimistic attitude seems surprising,
    given the very bug we're discussing today.
    
    3) This lack of defensive programing and perhaps over confidence (in
    ability to fully understand all the code paths) has been demonstrated
    by the Xen Security Team also previously. In the recently released XSA
    109 [2], the official patch also seemed to address the problem much
    earlier in the execution path rather than at the actual offending
    instructions, i.e. those that performed the NULL-dereference. While
    asked specifically about adding at least an additional check on these
    instructions, the Xen developers were unwilling to implement it
    implying potential performance impact.
    
    4) This is all certainly a bit disconcerting and we hope we could
    start a bit more public debate on these issues, especially among
    independent security researchers. We still believe Xen is currently
    the most secure hypervisor available, mostly because of its unique
    architecture features, that are lacking in any other product we are
    aware of.

[1]: https://raw.githubusercontent.com/QubesOS/qubes-secpack/mast...

zokier 11 years ago

> We still believe Xen is currently the most secure hypervisor available, mostly because of its unique architecture features, that are lacking in any other product we are aware of.
Does anyone know why KVM would be considered less secure than Xen?
- detaro 11 years ago
  
  (from memory, there are some design docs for Qubes OS floating around that discuss this) Xen is relatively small and contained, KVM sits on top of a full Linux kernel and potentially can access all of it, making it harder to tell what is accessible/exploitable and what is not. KVM also uses Qemu running as a process on the host linux for interfacing the VM, again exposing more potential attack surface. And I think Xen is better at isolating drivers, which for Qubes OS is a fundamental principle.
  - cthalupa 11 years ago
    
    >And I think Xen is better at isolating drivers
    Xen allows for creating an entire stub domU solely for running the driver, then giving a running guest access via ring buffer in a shared memory segment.
    (So, yep, you're correct in your thinking)
- rwmj 11 years ago
  
  My guess would be a couple of things: small Xen hypervisor vs potentially large Linux kernel, and driver domains. The latter involves putting each driver into its own domain (ie. Xen VM or process equivalent) and it means that bad drivers can do less damage to the rest of the system.
  - zokier 11 years ago
    
    Sounds like Tanenbaum vs Torvalds redux..
weland 11 years ago

de Raadt's remarks about virtualization do spring up in my mind.

fwilhelmOP 11 years ago

I wrote a blogpost with some more details about the bug, which you can find here: http://www.insinuator.net/2015/03/xen-xsa-123/

cesarb 11 years ago

Interesting blog posts (it and the preceding one). It seems that reliably emulating the x86 architecture is made even harder by a few features not found in other popular architectures, like an extra level of indirection on memory access (segment registers and the corresponding segment overrides) and most instructions having a memory-accessing variant (instead of limiting memory access to separate "load" and "store" instructions, plus a few specialized atomic RMW instructions).
mrmondo 11 years ago

Interesting post, thanks. By the way your site seems very slow to load in Australia - it might benefit from use of a CDN if you don't already use one. (I usually recommend Cloudflare to people, their free service is great)

ambrop7 11 years ago

Stupid question: Why does Xen need to emulate x86?

peterwwillis 11 years ago

tl;dr, Windows. But also various limitations of how PV or HVM hosts work.
PVH fixes it so you don't need emulation in Linux, but it's a brand new feature and probably not production quality. Read this (it's not too technical) http://wiki.xen.org/wiki/Virtualization_Spectrum
As of this writing, Xen 4.4 and Linux 3.14 have experimental support for PVH DomUs and Xen 4.5 has support for PVH Dom0s. PVH allows practically native-hardware-speed guests without any emulation.
- cthalupa 11 years ago
  
  Completely eliminating emulation isn't always desirable, unfortunately. The problem with PVH is the forcing the use of event channels to deliver interrupts.
  Local APIC emulation is fully accelerated on modern processors, which is a big win for any use case that is heavy on interrupts.

amyjess 11 years ago

Well, guess my Linode's getting rebooted again.

pilif 11 years ago

As far as I understand, this is the public release of the issue that caused the bigger providers to force the reboots on their customers. This means that there won't be any more reboots (for these issues at least)
- amyjess 11 years ago
  
  Ah OK, that makes sense.
Alupis 11 years ago

> Well, guess my Linode's getting rebooted again.
Things happen. We should be glad that these sort of security problems are being found and addressed; it would be naive to believe Xen or any other large codebase has zero security problems.
VM's should be regularly patched anyway, which usually requires a reboot now and then. If it were a physical server, the same would be true; just because things are in the "cloud" now doesn't mean they will have infinite uptime.
I understand that sometimes these services reboot and patch with little notice, but this should be built into any Terms Of Service constructed with your client(s); ie. "We will patch and reboot your service for critical vulnerabilities as quickly as possible, which in some circumstances may leave only short notice."
- drzaiusapelord 11 years ago
  
  Not to mention, my physical servers take ages to reboot considering all the BIOS and RAID checking they do. My Linodes, being VMs, literally boot in like 10 or 15 seconds. Maybe less. That's really minor downtime. My HP DL380s take several minutes.
  You can't migrate to a new kernel on Linode without a reboot anyway, so if you're proud of a 12+ month uptime, you're running a vulernable kernel.
  - kbar13 11 years ago
    
    yep. Plus, if you're doing vms correctly, you should be able to spin up new ones quickly to cover ones that go down.
- tedunangst 11 years ago
  
  Depends. My physical server isn't shared with anyone. Most local exploits are not a particular worry. A security vuln, almost by definition, requires a shared resource. No sharing, no caring.
  - Alupis 11 years ago
    
    That would depend on how "local" a "local exploit" is. If they require physical access to your system, well, then that's one thing. But if "local" is to mean on the network, that's far simpler for an attacker to pull off.
    > A security vuln, almost by definition, requires a shared resource. No sharing, no caring
    You would only not care if your servers have zero access to the internet and are air-gaped from the rest of your network (even then it's been proven some vulnerabilities can be exploited to gain access).
    
    tedunangst 11 years ago
    
    "local" is generally understood to mean "not the network".
josh2600 11 years ago

If you're really worried about having to deal with reboots, you can run Terminal on top of Linode and gain the ability to live-migrate all of your workloads (so you never have to take down your application because of the underlying metal rebooting).
I discussed it in detail previously on HN [0], but we give you the ability to live-migrate your workloads, even onto heterogeneous kernels. If that's something you really need, you can get it from Terminal today.
[0] https://news.ycombinator.com/item?id=9120289
- amyjess 11 years ago
  
  Meh, it was just a snarky comment. I don't really care; I'm not doing much with my Linode anyway. I'm mostly just using it as a shellbox/IRC client/Mercurial backup.
peterwwillis 11 years ago

If they have one spare xen host they can live migrate all guests from one host to the spare, patch the original host, reboot, then live migrate the spare's guests back to the original, and repeat. Patching them all and rebooting them all at once might be quicker though.
- mikeash 11 years ago
  
  Do you need to migrate them back? Would be faster to just use the newly patched original host as the new spare, and repeat like that.
- jroll 11 years ago
  
  This becomes a physics problem; it's a race between how quickly VMs can be migrated and when the embargo is lifted.
- avsm 11 years ago
  
  The XenServer toolstack supports exactly this mode of operation in a pool of physical hosts via a 'host evacuate' operation that live relocates VMs away and brings them back once the host upgrade is complete.
  http://docs.vmd.citrix.com/XenServer/5.0.0/1.0/en_gb/install...

mukyu 11 years ago

"Non-maskable interrupts triggerable by guests " http://xenbits.xen.org/xsa/advisory-120.html

"Non-standard PCI device functionality may render pass-through insecure " http://xenbits.xen.org/xsa/advisory-124.html

the others also released today

avsm 11 years ago

It's worth noting that most of these bugs are right in the innards of the x86 emulation. Xen/ARM is a breath of fresh air, since they took the decision to only support the new ARMv7 virtualization extensions. This eliminate the need for qemu running in dom0 per VM and the instruction emulation plumbing.

We've got a distribution of Xen 4.4/ARMv7/ubuntu for anyone curious to try it out on a cheapo Cubieboard2 or Cubietruck over at https://github.com/mirage/xen-arm-builder (with prebuilt SDcard images at http://blobs.openmirage.org)

Thaxll 11 years ago

Why is that Xen seems to have so many security issues compare to KVM?

mentat 11 years ago

More production use means more attention. Plenty of security issues everywhere.
- Someone1234 11 years ago
  
  In particular when you're developing in languages which are insecure by design.
  - Alupis 11 years ago
    
    Insecure code can be written in any language.
    
    Someone1234 11 years ago
    
    Just like any car can crash. However some cars are more dangerous than other cars, just as some programming languages are more likely to produce insecure code than other programming languages.
    Nobody is proposing re-writing a hypervisor in Java or Python, but C/C++ isn't the only game in town anymore for unmanaged code, and the alternatives are designed from the ground up with security in mind.
    
    Alupis 11 years ago
    
    > Just like any car can crash. However some cars are more dangerous than other cars, just as some programming languages are more likely to produce insecure code than other programming languages.
    > and the alternatives are designed from the ground up with security in mind
    The RMS Titanic was billed as one of the safest ships on the sea -- yet due to poorly implemented protocols and practices, negligent leadership, and disregard for best practices, it resulted in one of the most catastrophic maritime disasters.
    Using the most "secure" programming language in the world, one can still design very insecure code. Conversely, using the most "insecure" programming language in the world, one can still design very secure code. This would boil down to the skill of the engineers, competence of leadership and adherence to best practices.
    
    Someone1234 11 years ago
    
    The RMS Titanic sunk, but the engineers made it much harder to sink than ocean liners which preceded it.
    C/C++ starts you in a position where it is extremely easy to write insecure code. Even a competent coder can produce insecure code in either language without a great deal of effort or stupidity on their part.
    Other languages aren't "unsinkable" to come back to the Titanic, but they make it harder to sink, and the requirements on the developer aren't as high. Just like with the Titanic you have to hit the iceberg in a certain specific way to sink, rather than sinking from any old collision.
    Certain C/C++ compilers have definitely made the situation better when in "strict mode" as well as a lot of tooling to identify potential problem points. However ultimately the language is plagued by "undefined behaviour" and a large code-base where developers are using various insecure tricks to save pennies (e.g. this exact exploit, where they are creating insecure code to save a single structure's worth of memory, which on an 8 GB stick of RAM is less than 1/2 of 1c worth (assuming $60/8 GB stick)).
    
    rbanffy 11 years ago
    
    > C/C++ starts you in a position where it is extremely easy to write insecure code.
    It's worth mentioning K&R, which used to be where people learn C, has a huge number of instances of very risky practices.

Settings

Xen hypervisor memory corruption due to x86 emulator flaw

Keyboard Shortcuts