Last week I wrote about vertigo debt, the gap between shipping code and truly understanding it. This post is about paying some of that debt down.
I’m going to show you how I took an academic paper, understood it deeply enough to implement it, and used AI as a learning collaborator rather than a code generator. By the end, you’ll be able to read the screenshot below and know exactly what it means.
I haven’t engaged with real mathematics in nearly two decades. I did A-Level Maths, got a decent grade, and then spent eight years in enterprise sales where the hardest calculation was commission percentages.
When I transitioned to engineering 18 months ago, I got away with it. Most day-to-day code doesn’t require mathematical notation. You need logic, systems thinking, debugging skills. Not proofs.
Then I found this paper.
The paper
In 2021, Andrew Krapivin, an undergraduate at Rutgers, came across a paper called “Tiny Pointers.” He didn’t think much of it at the time. Two years later, he revisited it “just for fun.” [1]
What followed was one of those stories that makes you believe in tinkering. Krapivin wanted to make the pointers even smaller, which meant reorganising how data was stored. That led him to hash tables. And his explorations led him to accidentally disprove a 40-year-old conjecture.
In 1985, Andrew Yao [2] claimed that for a certain class of hash tables, you couldn’t do better than uniform probing (randomly checking slots until you find what you’re looking for). The fuller the table, the worse it gets, exponentially so. For four decades, most computer scientists assumed he was right.
Krapivin didn’t know about Yao’s conjecture. He just kept tinkering. When he showed his results to faculty mentors, the reaction was immediate. “You’ve completely wiped out a 40-year-old conjecture.” [3]
The paper, published in January 2025, describes “elastic hashing”. A technique that achieves worst-case performance proportional to (log x)² instead of x. At 99% capacity, that’s the difference between ~7 probes and thousands.
To understand why that matters, we need to start from first principles.
Hash tables from scratch
If you already know how hash tables work, skip to the next section. But even if you do, the recap might be useful. It sets up exactly what problem the paper solves.
Say you have a list of users. You store them in an array:
users[0] = "alice";
users[1] = "bob";
users[2] = "josh";
Retrieval is instant if you know the index. users[2] gives you josh immediately.
But what if you need to find a user by name? You don’t know where josh is. With a million users, you might check a million slots. (Binary search fans, hold your horses.)
What if we could turn the name directly into an index? Some function that takes "josh" and spits out 2. Then we could jump straight there.
That’s a hash function. It turns any key into a number:
fn hash(key: []const u8) u64 {
var h: u64 = 0;
for (key) |char| {
h = h *% 31 +% char; // *% wraps on overflow
}
return h;
}
hash("josh") // → 7482918374 Take that number modulo the array size, and you have an index:
7482918374 % 1000 // → 374
Now users[374] holds Josh’s data. Lookup is O(1), constant time, no matter how big the array. Done.
Except: what happens when two keys hash to the same index?
hash("josh") % 1000 // → 374
hash("emma") % 1000 // → 374 collision!
This is the collision problem. How you solve it determines everything about your hash table’s performance.
Linear probing: simple but fragile
The simplest approach: if slot 374 is taken, try 375. Then 376. Keep going until you find an empty slot.
insert("josh"): slot 374 ✓
insert("emma"): slot 374 taken → 375 ✓
insert("alex"): slot 374 taken → 375 taken → 376 ✓
This works. It’s cache-friendly (sequential memory access). CPUs love it. [4]
But there’s a problem. As the table fills up, collisions cluster. One collision creates a “hot spot” that makes future collisions more likely in that area. At 50% load, you barely notice. At 90% load, it’s sluggish. At 99% load?
Try it yourself. Watch the clusters form:
Linear Probing
Watch clusters form as the table fills up
Click +1 a few times, then Fill to 90% to see clustering in action
At larger scales this gets worse. With 100,000 entries, worst-case lookups can hit tens of thousands of probes. [5]
Why 99% load matters
You might ask: why run a hash table that full? Just resize it.
Sometimes you can’t. You’ve got a fixed memory budget and need to use every byte:
- Embedded systems: No dynamic allocation
- Caches: You paid for 4GB, you want to use 4GB
- Real-time systems: Resizing causes latency spikes you can’t afford
- Memory-mapped files: Hash table on disk, shared between processes
- Game engines, trading systems: Pre-allocated pools, no malloc in the hot path
In these scenarios, you allocate once and live with it. When your data grows to 99% of capacity, you need a hash table that doesn’t fall apart.
That’s what the paper delivers.
Hitting the wall
When I opened the PDF, I wanted to understand the solution. But I hit a wall immediately:
Our construction will make use of a specific injection φ : Z⁺ × Z⁺ → Z⁺ There exists an injection φ : Z⁺ × Z⁺ → Z⁺ such that φ(i,j) ≤ O(i · j²)
What’s an injection? What’s Z⁺? Why are there so many symbols?
I had two choices: give up, or use AI to help me decode it.
Step 1: Decode the notation
The first barrier isn’t the ideas. It’s the language. Academic maths compresses meaning into dense symbols. Once you unpack it, the ideas are often simpler than they look.
I asked Claude to translate:
Me
What does φ : Z⁺ × Z⁺ → Z⁺ mean? Explain like I’m a programmer who hasn’t done maths in 15 years.
Claude
This defines a function called φ (phi). It takes two positive integers as input and returns one positive integer. In code:
fn phi(i: usize, j: usize) usizeZ⁺ means ‘positive integers’ (1, 2, 3…). The × means it takes a pair. The → means ‘returns’.
Suddenly this wasn’t hieroglyphics. It was a function signature.
Me
And what’s an ‘injection’?
Claude
An injection is a function where every unique input gives a unique output. No two different (i, j) pairs produce the same φ(i, j). In programming terms: it’s a perfect hash. No collisions.
Two questions in, and the scary notation was becoming readable.
Step 2: Understand the claim
With the notation decoded, I could parse Theorem 1:
It is possible to construct an open-addressing hash table that supports n − ⌊δn⌋ insertions in an array of size n…
We can fill the table to (1 - δ) capacity. If δ = 0.01, that’s 99% full.
…that offers amortized expected probe complexity O(1), worst-case expected probe complexity O(log δ⁻¹)…
Average lookups are constant time. Worst case is logarithmic in how full the table is. At 99% load: O(log 100) ≈ O(7). Compare that to linear probing’s 16,972 probes.
This was the hook. High load factor and bounded worst-case? That’s exactly what you need for fixed-capacity scenarios.
Step 3: Understand the mechanism
Here’s where papers get dense. But the core idea is simple: instead of one array, use multiple arrays of decreasing size.
Capacity 1024:
arrays[0] = 512 slots
arrays[1] = 256 slots
arrays[2] = 128 slots
...
arrays[9] = 1 slot
Same total memory. But when you probe for a key, you don’t walk sequentially through one array (which causes clustering). You probe across all arrays in a specific order defined by the φ function.
Why does this help? In linear probing, clusters feed on themselves. One collision makes adjacent collisions more likely. With multiple arrays, a “collision” in array 0 doesn’t affect array 1. The load spreads out naturally.
Here’s the same 50 slots, but organized into tiers:
Elastic Hashing
Same 50 slots, spread across multiple tiers
Same capacity as above, but probing spreads across tiers — watch the worst case stay low
Compare the worst-case lookup to the linear probing demo above. Even at 90% load, it stays bounded. At this small scale the difference is modest, but at 100,000 entries the gap becomes dramatic (see the benchmarks below).
The key is interleaved probing. Instead of exhausting one array before trying the next, you check offset 0 across all arrays, then offset 1 across all arrays, and so on.
Think of “offset” as distance from home. At offset 0, you check your ideal slot in all 6 tiers. That’s 6 checks, but you haven’t moved yet. If all 6 are occupied, you try offset 1 in all tiers.
The total checks might be similar to linear probing. So why is this better?
In linear probing, a cluster of 24 slots is a wall you have to walk through. And every new item that hashes into that cluster makes it longer. Clusters feed on themselves. At high load, one cluster can span the entire array. That’s your O(n) worst case.
With elastic hashing, there’s no wall. Being at offset 4 means checking 24 slots, but they’re spread across 6 independent arrays. A “cluster” in tier 0 doesn’t touch tier 1. The clusters can’t merge. The maximum offset is bounded by the tier structure, not by how unlucky your data is.
Same checks, but a ceiling on how bad it can get.
Here’s how the tiers get set up. Each array is half the size of the previous:
// Split 1024 into: 512 + 256 + 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1
var offset: usize = 0;
var size = capacity / 2;
for (0..num_arrays) |i| {
starts[i] = offset;
sizes[i] = size;
offset += size;
size = @max(size / 2, 1); // halve each time, minimum 1
} And here’s the lookup. The nested loop structure is the entire insight:
pub fn get(key: K) ?V {
const hash = hash(key);
// Outer loop: probe offset (1, 2, 3...)
// Inner loop: ALL arrays at this offset before incrementing
var offset: usize = 1;
while (offset <= max_probes) : (offset += 1) {
for (0..num_arrays) |i| {
if (offset > sizes[i]) continue;
const idx = starts[i] + (hash % sizes[i]);
if (keys[idx] == key) {
return values[idx];
}
}
}
return null;
} At offset 1, you check 10 positions (one per tier). Only if all of those miss do you check offset 2. This means worst-case probes grow with the number of offsets needed, not the size of any individual cluster. [6]
Translating the theorem
Let’s return to that screenshot. Here’s what it says now:
Translated
Check out my function φ - it needs an array index and probe number as inputs, and as an output it tells you the order to look for something. It prioritises early slots in smaller arrays.
That’s it. A function signature with a growth bound. The dense notation was just compression. The idea underneath is almost disappointingly simple: probe multiple arrays in a carefully chosen order.
Step 4: Go further
I could have stopped there. I had a working implementation in Zig: 305 worst-case probes vs linear probing’s 16,972 at 99% load. A 56x improvement in the tail, same memory.
But I wanted to see how it compared to production code. Benchmarking against Zig’s std.HashMap (which is Swiss-table inspired) sent me down more rabbit holes.
[7]
The final implementation combines:
- SIMD 16-slot buckets that scan fingerprints in parallel
- Three-case batch insertion from the paper—dynamically routes inserts between tiers based on fullness, with probe limits derived from the φ priority function
- Comptime specialization that unrolls tier loops when capacity is known at compile time—4x faster lookups
- Bitmask tricks (
@ctz) to find empty slots without branching
The hybrid beats Zig’s std.HashMap at 99% actual capacity—4x faster inserts, with better worst-case lookup latency. Next week I’ll do a deep dive on the implementation: how the batch algorithm actually works, and what I learned about SIMD.
In summary
I didn’t need to understand elastic hashing. I could use Zig’s standard library, or Go’s map, or Python dictionaries, or JavaScript objects forever. But now I understand the tradeoffs, not just the API. Next time I’m designing a cache for millions of writes, I’ll know why it matters, and which hash table design to reach for.
More importantly, I now know I can take cutting-edge computer science and turn it into working code. The notation that scared me was just unfamiliar syntax. With AI as a translator, the gap between “academic paper” and “thing I can build” got a lot smaller.
Remember Krapivin? An undergraduate who disproved a 40-year-old conjecture by tinkering “just for fun.” That’s the door AI opens. Not replacing the tinkering, but making it possible for more of us to try.