Pointer-stable Dynamic Arrays

Dynamic arrays, as traditionally taught in computer science, reallocate memory when new elements are appended to the end. This has some undesirable effects when doing systems-level programming:

The performance of the append operation is unpredictable. Appending an element is often very fast, but sometimes it causes the entire underlying block of memory to be reallocated.
After this reallocation occurs, any pointer or reference to an element inside the dynamic array is invalidated.

Data structures that implicitly invalidate pointers make it more difficult to verify the correctness of a program, since most pointer-related bugs stem from assuming a pointer is valid when it isn’t.

This article presents a data structure that preserves the nice properties of classic dynamic arrays — O(1) random access, tightly packed elements, etc. — while completely avoiding the problems of memory reallocation.

Static arrays (i.e., arrays that never grow in size) are very fast and simple to reason about. If the maximum number of elements has a clear upper bound, a static array can be used in place of a dynamically allocated array.

This method is a fairly typical pattern in C programming: memory is over-allocated into a buffer upfront, and any dynamic allocation is essentially an index or slice into that buffer.

While using static arrays comes with the obvious limitation that the amount of memory needed has to be known ahead of time, it also fixes two of the aforementioned problems with dynamic arrays:

The base pointer address never changes and is valid throughout the lifetime of the program, meaning that pointers taken to elements within the array are never invalidated.
Appending an element is always a fast O(1) operation.

Static arrays break down when the number of potential elements in the array is unpredictable. Even if the array only stores a few elements, memory for the maximum number of potential elements must still be allocated. On the other hand, if the array needs to store a very large number of elements, predicting the correct maximum capacity can be difficult.

But what if this were a false dichotomy? The problem with many classic CS data structures is that they assume a malloc/free style of memory allocation, which is far from the only way of doing things. In the next section, we’ll look at some tricks with virtual memory that allow more flexibility when designing data structures with contiguous memory layouts.

Instead of trying to pre-compute what size a statically-allocated array should be, what if it were simply as large as possible?

If a computer has 16GB of available RAM, a natural maximum size of any data structure is 16GB. There is a point where enough memory can be statically allocated to handle any realistic use case.

What may be surprising is that when the program starts executing, the operating system and compiler are not obligated to allocate any memory for an array whatsoever. It’s only when the array is written to that the data in the array starts getting mapped to physical memory.12

This effectively means that static arrays, just like dynamically heap-allocated arrays, automatically grow in size on an as-needed basis.

This can be proved more methodically by using an OS API that returns the total amount of memory physically mapped to RAM.

The implicit allocation can be measured by reading the total amount of memory physically mapped to RAM before and after writing one byte to a static array.

On x64, writing 0x1 to any location in a zero-initialized static array causes an implicit allocation of 4096 bytes. This is due to how memory works in modern operating systems.

The linear address space that a program sees is subdivided into 4KB sections, each one mapping to a different 4KB range on physical hardware. The 4KB sections are referred to as pages. This mapping is known as virtual memory.

The operating system can be forced to map two virtual pages to physical memory if a program writes two non-zero bytes that are exactly 4096 bytes away from each other.

While most CPUs natively support a page size of 4KB, it’s important to note that this varies depending on CPU architecture.3 Both Linux and Windows, for example, support large pages, which increases the page size up to 1 or 2MB if the underlying processor supports it.

In addition to allocating pages implicitly, the OS also exposes APIs that give programs direct control over virtual memory. This will be the subject of the next section.

On Windows, the primary function for allocating virtual memory is called VirtualAlloc. Memory can be allocated in a straightforward way by doing the following:

VirtualAlloc takes a set of flags as the third argument. When both MEM_RESERVE and MEM_COMMIT are set, it works like any other memory allocator, except that VirtualAlloc allocates pages directly from the operating system, bypassing the heap.

The second argument takes the size of the allocation. Since virtual memory is divided into 4KB pages, allocations are always a multiple of 4KB. In practice, Windows rounds this up further to 64KB due to backwards compatibility reasons.4

When only the MEM_RESERVE flag is set, VirtualAlloc reserves a pointer address for future use, without actually allocating any memory.

The program can call VirtualAlloc again with previously-reserved pointer address as the first argument. Setting the MEM_COMMIT flag converts the pointer from being reserved-only to an active pointer the program can read and write to.

The allocation of pointer addresses and the underlying physical memory can be completely decoupled, which allows the program to use memory in a very flexible way.

Another surprising aspect of virtual memory is that it’s possible to reserve several terabytes of pointer address space at once, even if the total amount of physical RAM on the computer is far less than that.

This is possible because reserving pointer address space has relatively little cost. It only guarantees that future calls to VirtualAlloc won’t return a pointer within the reserved range.

It should be noted that virtual address space, while abundant for the needs of most software, does have a limit. A 64-bit process on Windows has a virtual address space limit of 128 terabytes.5

While VirtualAlloc is Windows-specific, the same concept applies to almost any modern operating system. The equivalent functionality is available on Linux through the mmap and mprotect syscalls.

Using the tricks of virtual memory, we can create a dynamic array that never invalidates pointers and avoids the heap allocator altogether.

A standard implementation would:

Reserve a large amount of address space, far more than the array could ever possibly use.
When an element is about to be appended, commit more virtual pages if necessary.
Append the element and increase the count by one.

At a minimum, this data structure will need to store a base pointer, the element count, and the amount of memory that has been allocated by VirtualAlloc using MEM_COMMIT.

To initialize the array, we reserve a large section of virtual address space. The base pointer is set to the beginning of the reserved region.

When the virtual array is created, the base pointer technically points to invalid memory. Virtual memory lets us defer allocation until memory is actually needed.

The append operation of a virtual memory-backed array does not require copying or reallocating data. Instead, it simply sets a region of pre-reserved pointer address space as active for reads and writes. This same technique can be generalized beyond typed arrays.

Virtual memory tricks work well with custom allocators, the simplest of which is a linear allocator.

A linear allocator is effectively a stack of bytes that can push and pop from the end. It only needs to store a base pointer, a byte offset, and a total capacity. The offset is bumped forward with each allocation. If the offset ever becomes greater than the total capacity, the allocator has run out of memory.

The problem is that it might not be possible to know what the total capacity should be ahead of time. In this case, we can use virtual memory to first reserve a large section of pointer address space and only commit as necessary.

Optimized linear allocators are just one example of what becomes possible when memory allocation is decoupled from the heap. More complex allocators can be built using the same virtual memory techniques.

Many classic data structures rely on a heap-based allocation strategy. Revisiting that assumption with virtual memory can lead to simpler, faster, and more correct alternatives.

This article showed that both dynamic arrays and linear allocators can be made more efficient by first reserving a large pool of memory and only committing virtual pages as necessary. Virtual memory techniques apply to many other data structures as well.

As with everything, there are trade-offs when using virtual memory in this way. Virtual address space, while abundant for the needs of most software, is limited. When used deliberately, however, it can be an incredibly useful tool.