xv6 Hacking : Virtual Memory

6 min read Original article ↗

We can solve any problem by introducing an extra level of indirection.
– David J. Wheeler

Consider the following user-space program, which makes a system call getpinfo.

#include "types.h"
#include "user.h"
#include "pstat.h"

int main(int argc, char *argv[]) {
struct pstat p;
getpinfo(&p);
exit();
}

getpinfo is a syscall which accepts pointer to a struct, it will populate the struct with some attributes of all running processes (kind of a ps command).
Below code snippet shows the kernel code where this copying happens. ptable in the code below is a kernel data structure.

void copypinfo(struct pstat *dest)
{
for (int i = 0; i < NPROC; i++) {
dest->inuse[i] = ptable.stat.inuse[i];
dest->pid[i] = ptable.stat.pid[i];
dest->tickets[i] = ptable.stat.tickets[i];
dest->ticks[i] = ptable.stat.ticks[i];
}
}

Two observations about the above snippet

  • The struct dest was allocated on the heap of the userspace program.
  • getpinfo populates this struct in the kernel mode (since all syscalls run in kernel mode), this implies that when kernel is running it can access it’s own memory and the user program memory (which invoked the syscall) in a very transparent way.

Translation Process

First let’s run the userspace program and put a breakpoint in copypinfo method shown above and print some useful memory addresses.

The address of dest (the memory address passed from userspace to syscall) is 0x00002bc0. This is a virtual address.

The address of ptable (kernel’s process table) is 0x80112d40, this is also a virtual address.

The cr3 register contains the value 0xdf23000. This is a physical address and this value was loaded in the register when this particular process was scheduled, when some other process is scheduled a different value will be loaded in cr3.

The value in cr3 is the base address of the page directory page. The x86 hardware uses this address as the starting point and uses the two-level paging system to get the Physical address.

x86 two-level paging can be described as follows

  • Both virtual memory and physical memory are divided into units called pages, xv6 uses 4096 bytes for the page size. (x86 hardware also supports 2 and 4 MB page size).
  • A virtual page is mapped to a physical page. The assembly instruction use Virtual addresses which are translated by hardware into Physical addresses before it accesses memory.
  • Each process has it’s own page directory and page tables.
  • cr3 points to the Page directory of the current process.
  • Page directory contains 1024 32-bit entries pointing to a Page Table.
  • Each Page table contains 1024 32-bit entries pointing to a Page.
  • Each Page is of size 4096 bytes and each byte is addressable.

Userspace Virtual Address Translation Example

A virtual address (VA) is 32-bit long and for the purpose of translation is divided into three parts.

  • First 10-bits are used as an offset in the Page Directory to select 1 entry from the 1024 entries.
  • Next 10-bits are used as an offset in the Page table to select 1 entry from the 1024 entries.
  • Last 12-bits are used as an offset in the physical page to select 1 byte from the 4096 bytes.

Note that a Page directory and Page table are of size 4096 bytes (1024 entries * 4 bytes per entry) and thus neatly fit in a physical page themselves.

Userspace VA – 0x00002bc0

10-bits10-bits12-bits
Binary00000000000000000010101111000000
Hex0x00x20xbc0

In this case first 10-bits are all zero which means we need to get the first entry in the page directory.

Using QEMU monitor we get the value 0x0dee1027
(xp /hw <addr> returns the 4 bytes of memory starting at the physical address <addr>)

Each entry value whether it is in Page Directory or in Page table has the following format.

0x0dee1027 in binary is shown below

PPNFlags
000011011110111000010000 0010 0111

The flags indicate that this entry is present, writable and belongs to the user.

Now, to get the Page table entry we will use the first 20-bits of 0x0dee1027 as the base address of the page table and the next 10-bits of the userspace VA as the index.

0x0dee1000 – Base Address of the Page Table

0000000010 (2 in decimal) – Index into the Page table, 8th byte as each PTE is 4-bytes (32-bits) long.

The Physical address of the PTE becomes 0x0dee1008, 9th byte from the base address.

The value at this address is 0x0dedf067, it’s binary breakdown in PPN and Flags is shown below

PPNFlags
000011011110110111110000 0110 0111

Now we take first 20 bits (PPN) and last 12-bits of the original VA we are trying to translate to get the actual Physical address.

0x00002bc0 – Virtual Address
0x0dedfbc0 – Translated Physical Address

You can see that last 3 digits in hex (last 12-bits in binary are same) because both are offests in the 4096 byte page.

We can validate our translation using Qemu Monitor’s gva2gpa command which translates guest virtual address to guest physical address.

Kernel’s Virtual Address translation example

Now let's try out the same translation process for a Virtual Address which belongs to the kernel.

Virtual Addres of Kernel data (ptable) – Ox80112d40, it’s breakdown is given below.

Page Dir. OffsetPage Table OffsetPage Offset
Binary10000000000100010010110101000000
Hex0x2000x1120xd40

Page Directory base address – 0xdf23000

Page Directory Entry address – 0xdf23000 + (0x200 * 0x4) = 0xdf23800

Page Directory Entry value – 0x0df22027, it’s breakdown is given below

PPN (20-bits)Flags (12-bits)
00001101111100100010000000100111

Note that this
entry is accessible to the user
2nd bit is 1

Page Table Entry address – 0x0df22000 + (0x112 * 0x4) = 0xdf22448

Page Table Entry value – 0x00112063

PPN (20-bit)Flags (12-bit)
00000000000100010010000001100011
Entry not accessible to user.
2nd bit is 0

Physical address – 0x00112000 + 0xd40 = 0x00112d40

The above translation is also validated by Qemu Monitor’s gva2gpa command.

In the Kernel virtual address translation example – we saw that a process’s page directory contains it’s own userspace mappings as well as kernel’s code and data mapping.

This design choice greatly simplifies the user mode to kernel mode switching, because when running in kernel mode hardware can still correctly translate the userspace virtual addresses.

The Layout of Process Memory

The above image shows the layout of both physical and virtual memory.

  • Virtual Memory (user) – The User memory starts at lower address. The userspace struct whose VA we translated earlier was on the User stack and therefore had a lower address value (0x00002bc0).
  • Virtual Memory (kernel) – User memory spans from 0 -> KERNBASE (xv6 configures it to 0x80000000), Virtual addresses above KERNBASE belong to the Kernel. The Kernel address that we translated Ox80112d40 was above 0x80000000.
  • Virtual Memory (kernel has predictable addresses) – In the Kernel VA Translation Example the Ox80112d40 resolved PA 0x00112d40, this is not a coincidence, xv6 creates mapping for kernel such that it can translate Kernel VA to PA by subtracting the KERNBASE value. This simplifies the logic of allocating pages.