How Intel Virtualisation Works

binarydebt.wordpress.com

160 points by bytefire 7 years ago · 31 comments

Reader

I haven't read this article yet, but this is more or less the best moment to showcase you guys what a friend of mine made. I think it's really cool.

He created a memory allocator in which it is impossible to create dangling pointers. He used it by becoming the kernel through Intel VT-x (i.e. he uses ring 0). He uses libdune for this, which in turn uses Intel VT-x.

Check it out at: https://dangless.gaborkozar.me/

I'm going to write an ascii diagram in the upcoming edit. For now: I'll just leave you with the legend that my friend made.

Note: my friend made a video and slides. So for people who are interested, his slides and videos are much nicer to look at than this diagram.

DIAGRAM (of all the physical and virtual memory)

|1|<-A->|2|<-B->|3|<-C->|4|

LEGEND

1 = host physical memory

2 = host virtual memory

3 = guest physical memory

4 = guest virtual memory

A: normal host pagetable

B: embedded page table (this is VT-X thingy)

C: guest page table (this is what I mess with)

cperciva 7 years ago

I'm surprised that doesn't have a larger performance cost, since it's requiring a TLB entry for each memory allocation. I wonder if the benchmarks understate the cost due to being undersized for modern systems.
- drb91 7 years ago
  
  > I'm surprised that doesn't have a larger performance cost,
  For what workload?
twic 7 years ago

Reminds me a bit of the segmentapalooza design of the APX architecture:
https://en.wikipedia.org/wiki/Intel_iAPX_432#Object-oriented...
mastax 7 years ago

This is probably interesting enough for its own post.
- mettamage 7 years ago
  
  Done! You can see it here: https://dangless.gaborkozar.me/
  - mettamage 7 years ago
    
    Edit: I was on my phone that is just his normal website that I linked to earlier.
    Here is the HN entry on it: https://news.ycombinator.com/item?id=18214738
fulafel 7 years ago

From the slides it seems that there are still dangling pointers, but the addresses will not be reused for valid allocations? Thus mitigating security vulns from the dangling pointers.
crumbshot 7 years ago

That is interesting, I am wondering how this differs in performance for existing page-based approaches used in debugging, such as Page Heap on Windows or Guard Malloc on Mac OS.
DenisM 7 years ago

One could alternatively never reuse addresses and decommit pages once the last object on a page is gone. No need for VT.
You might run out of address space eventually, that might be a good moment to drain current workitems and launch new ones into a replacement process. This would work well for things like web services since each request is relatively short lived.
bytefireOP 7 years ago

very interesting and creative use if EPT, will read the link. thanks for sharing

bonzini 7 years ago

It's weird to read a blog post about software that you know in and out. There are a few inaccuracies here and there but it's very clear and well done. Kudos!

bytefireOP 7 years ago

thank you that means a lot! please do add any information you think is relevant :)
- bonzini 7 years ago
  
  The bit about TLBs is a bit confusing, it seems like you're taking about a software TLB but EPT is just a second layer of address translation.
  Also, after moving a VMCS from a physical CPU to another you have to do VMLAUNCH the first time your start the guest on the new CPU, because you had VMCLEARed it on the old CPU. That's it. :-)
  - bytefireOP 7 years ago
    
    very good, thank you. i'll try to tidy it up

burfog 7 years ago

Last I checked, every virtualization driver ignored Intel's overcomplicated design choice. They don't keep things going; if they did then they would clash with each other. Instead, they fully shut down virtualization when the VM isn't running code.

Intel seems to have accepted this state of affairs. On newer chips, it is much faster to enable and disable virtualization.

bonzini 7 years ago

No, this is not true. KVM always keeps VMX on, Xen too even when running paravirtualized guests, Hyper-V does not even have a concept of "the VM not running code". Maybe VMware Workstation and VirtualBox?
- sebazzz 7 years ago
  
  To clarify for Hyper-V: in Hyper-V, even the host OS is virtualized and runs together with the guests at the same level.
- burfog 7 years ago
  
  Well, I'm part of a team that maintains a VMX driver, and we've looked at what the competition did. It's been a while since we did that, so change is possible. Hyper-V might be special.
  We could be talking past each other. Here, to clarify, are 3 methods:
  x. The driver never does VMXOFF.
  y. The driver does VMXON when asked to run a guest. The driver may handle events from the guest (such as page faults or CPUID emulation) without doing VMXOFF, but the driver will do a VMXOFF prior to letting other host processes and drivers run.
  z. The driver does VMXOFF every time the VM exits.
  We found that choice x was not normally used. If it were, then VMX drivers would not be able to coexist with each other. I'm not saying that everybody uses choice z. Choice y is probably also popular.
  - mappu 7 years ago
    
    > If it were, then VMX drivers would not be able to coexist with each other.
    Most VMX drivers are unable to coexist (in the sense that VirtualBox, HAXM, Hyper-V, and VMWare Player are mutually incompatible / can't be used in the same Windows boot session).
    
    souprock 7 years ago
    
    I just checked a proprietary VMX driver.
    It fully disables VMX before turning interrupts back on.
    If I remember right, it works fine with VMWare running on the same machine, so they must be doing likewise. I think I recall problems with Hyper-V, so you are probably right about that one. It looks like Hyper-V is the uncooperative VMX driver that refuses to play nice with others.
pjmlp 7 years ago

To build on bonzini's answer there are multiple types of virtualization.
What mainframes have done for years and modern PC VMs do, is type 1 virtualization, whereas stuff like Virtual Box is type 2 virtualization.

userbinator 7 years ago

One thing I find annoying about x86 virtualisation is that it already has a mode called V86, introduced in the 386, but instead of extending that with more functionality, they introduced yet another set of instructions, and of course AMD also has its own completely incompatible way to do virtualisation. The nice thing about V86 is it integrates well with the existing task-segment model.

rodgerd 7 years ago

> AMD also has its own completely incompatible way to do virtualisation.
You have this reversed: AMD developed x64-64 virt, and Intel decided to go their own way.
- tedunangst 7 years ago
  
  But VT-x was released November 2005 and AMD-V released May 2006?
bytefireOP 7 years ago

hi userbinator :) isn't the purpose of virtual 8086 mode somewhat different? i.e. to run real mode applications while the cpu is in protected mode? or did you mean that virtual 8086 could be generalised into a wider virtualisation system?
- userbinator 7 years ago
  
  or did you mean at virtual 8086 could be generalised into a wider virtualisation system?
  Yes, if you look at the way V86 is implemented, it wouldn't be too hard to extend it to full virtualisation --- something like a "VMX mode task" would've been ideal.
  - bytefireOP 7 years ago
    
    i see, makes sense. may be a different team from V86 worked on it? Conways law :)

ceautery 7 years ago

Since Intel chips are really RISC underneath the hood, I wonder what crazy x86 emulation hoops they have to jump through already.

bytefireOP 7 years ago

good point. may be the central idea of how it's implemented isn't too bad: i see hypervisor as a sort of OS kernel for VMs and the transitions from VM to hypervisor - VM exits - akin to syscalls. of course there is more but the above analogy is the basic idea and other things get added along the way

Settings

How Intel Virtualisation Works

Keyboard Shortcuts