Building in-browser autocomplete for millions of entries with Radix Tries

Published 13.01.2026 • Last modified 09.03.2026

Recently, I’ve gotten hooked on the game Elite Dangerous, a space flight simulation game which contains a 1:1 scale replica of the Milky Way galaxy. In the game, players can use ships to jump between star systems, with a very limited range (usually around 80 lightyears).

There are around 400 million stars discovered by players. A small fraction, around 4.4 million of them, are especially important for explorers. They are a specific type of star, the neutron star¹. I’ve been working on a web app which (among other things) visualizes neutron stars in a galaxy view.

Screenshot of galaxy visualizer, showing stars scattered around in the form of a galactic plane — There are about 4.3 million stars in this screenshot, and the browser is happily rendering it at 144fps thanks to Three.js and WebGPU.

To make the experience of the search box better, I decided to build an autocomplete system that shows suggestions for star names as you type.

This post is about discovering the secrets of efficient in-browser autocomplete, and also optimizing the size of the data structure to its absolute limit.

Many community websites around Elite Dangerous, like inara.cz have server-side autocomplete because they index 400+ million stars.
I wanted to make the whole app completely client-side since only I need to index ~4 million star names. I tried to find libraries that solved this, but none of them supported millions of entries like I wanted.

The initial problem #

When initially researching the topic of autocomplete systems, I had the following criteria for the solution:

Must be able to find suggestions based on a prefix (predictive search/autocomplete)
Every star name must map to a unique ID in [0, N]
Every ID should map back to the star name (reverse lookup)

The unique ID requirement is because I want to find star names based on their IDs so that I can display them on screen.

There was really only one obvious choice that fulfilled all of these requirements..

The humble trie data structure #

The trie, short for “retrieval”, is a data structure where each child node shares a common prefix with its parent node. Pictures are worth more than a thousand words as they say, so here is what a trie looks like²:

                         ┏━━━┓ ┌───┐ ┏━━━┓
                      ┌─▶┃ M ┣▶│ A ├▶┃ N ┃
┌──────┐  ┌───┐ ┌───┐ │  ┗━━━┛ └───┘ ┗━━━┛
│ Root ├─▶│ R ├▶│ O ├─┤                   
└──────┘  └───┘ └───┘ │  ┏━━━┓            
                      └─▶┃ W ┃            
                         ┗━━━┛

As you can see, it’s a tree where every child of a node shares the same prefix. The nodes which terminate a word are emphasized. This means that the trie above contains the words “ROW”, “ROM” and “ROMAN”.

This looks simple enough to implement. For higher performance, I’ll be using Rust and compiling it to WebAssembly. Here is the Trie struct in Rust:

struct Trie {
    root: TrieNode,
}

#[derive(Default)]
struct TrieNode {
    children: HashMap<u8, TrieNode>,
    is_terminal: bool,
}

Adding a word to the trie involves iterating through every character of the word, descending down the tree adding new trie nodes as needed, and marking the last one as terminal.

impl Trie {
  fn add(&mut self, word: &str) {
      let mut current_node = &mut self.root;
      for c in word.bytes() {
            current_node = current_node.children.entry(c).or_default();
      }
      current_node.is_terminal = true;
  }
}

Suggesting words in the trie based on the current search word involves traversing through the nodes until the end of the search word is found, and then collecting words from the subtree at that point.

impl Trie {
  fn suggest(&self, prefix: &str) -> Vec<String> {
      let mut current_node = &self.root;
      // Find the node corresponding to the end of the prefix
      for c in prefix.bytes() {
          match current_node.children.get(&c) {
              Some(node) => current_node = node,
              None => return vec![],
          }
      }
      let mut suggestions = Vec::new();
      self.collect_words(current_node, prefix.to_string(), &mut suggestions);
      suggestions
  }

  // recursively collect words in the subtree at `node` into `suggestions`
  fn collect_words(&self, node: &TrieNode, prefix: String, suggestions: &mut Vec<String>) {
      if node.is_terminal {
          suggestions.push(prefix.clone());
      }
      for (&c, child_node) in &node.children {
          let mut new_prefix = prefix.clone();
          new_prefix.push(c as char);
          self.collect_words(child_node, new_prefix, suggestions);
      }
  }
}

I hope that you understand the basic idea of how tries work through these code snippets. From here on, this post focuses more on memory layout and serialization rather than high-level data structures.

Flattening the trie #

The Trie structure above is quite elegant, but does not work for my purposes. I want to be able to store the trie in a file as a sequence of bytes, but HashMap is not suitable for that. The trie needs to be flattened into a linear sequence of TrieNodes.

The trick to flattening the tree is getting rid of the HashMap and adding two new fields, first_child and next_sibling in TrieNode.

struct FlatTrie {
    nodes: Vec<TrieNode>,
}

struct TrieNode {
    ch: u8,
    first_child: i32,
    next_sibling: i32,
    is_terminal: bool,
}

first_child points to an index in the nodes array which contains the first child node. Every node also contains another index next_sibling, which points to the next sibling, or -1 otherwise.

                         ┌───┐ ┌───┐ ┌───┐ 
                      ┌─▶│ 3 ├▶│ 4 ├▶│ 5 │ 
 ┌──────┐ ┌───┐ ┌───┐ │  └───┘ └───┘ └───┘ 
 │ Root ├▶│ 1 ├▶│ 2 ├─┤                    
 └──────┘ └───┘ └───┘ │  ┌───┐             
                      └─▶│ 6 │             
                         └───┘             
                                           
┌──────────┬────┬────┬────┬────┬────┬─────┐
│ Index    │  1 │  2 │  3 │  4 │  5 │  6  │
├──────────┼────┼────┼────┼────┼────┼─────┤
│ Label    │  R │  O │  M │  A │  N │  W  │
│          │    │    │    │    │    │     │
│ Terminal │  0 │  0 │  1 │  0 │  1 │  1  │
│          │    │    │    │    │    │     │
│ Child    │  2 │  3 │  4 │  5 │ -1 │ -1  │
│          │    │    │    │    │    │     │
│ Next     │ -1 │  -1│  6 │ -1 │ -1 │ -1  │
│ sibling  │    │    │    │    │    │     │
└──────────┴────┴────┴────┴────┴────┴─────┘

This means that traversing through the tree is slightly more cumbersome because for each character it might need to go through every child before finding the correct one. This tradeoff however allows us to pack everything into a flat Vec<TrieNode> which can be written directly to a file. It also means that deserialization is essentially free: the trie can be loaded into memory and then used without any further preprocessing.

This is looking pretty good on paper, but fails in practice: a trie with 4.3 million stars names is nearly 300 MB!

This is caused by the fact that all those single character nodes waste a lot of space. For every single byte of character data, there are 9 bytes of metadata.

Is there a way to reduce the size of each node? Yes definitely. next_sibling and first_childdon’t need the full 32 bits to represent all possible indices, only 24 bits or so will do. Therefore, the ch and is_terminal fields can be bit-packed into the the u32 fields. That would definitely reduce the size, but a better idea is to reduce the number of nodes by using radix tries.

Current trie size: 279 MB  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■

Radix tries #

To explain the advantage of radix tries, first I will show a list of some of the star names:

Speamoo BX-R d5-1
Speamoo XZ-G d10-0
Assaips NT-Q e5-0
Hypiae Phylio AA-A g0
Phoi Phylaa AA-A g0
Phoi Braea EG-Y g0

It’s apparent that many of the stars share the same prefix. Because of this, there are a lot of nodes with exactly one child node. We can combine those nodes into their parents, combining their characters together. This is called a radix trie or compact prefix tree ³.

For example, the radix version of the above trie would look like this:

                    ┏━━━┓ ┏━━━━┓
                 ┌─▶┃ M ┣▶┃ AN ┃
┌──────┐  ┌────┐ │  ┗━━━┛ ┗━━━━┛
│ Root ├─▶│ RO ├─┤              
└──────┘  └────┘ │  ┏━━━┓       
                 └─▶┃ W ┃       
                    ┗━━━┛

With this method, significantly fewer nodes are required. That’s great, but where should the labels be stored? They can’t be stored in the nodes as fixed-size arrays because some slots will be left unused, thus wasting space again.

The solution is quite simple, just allocate a new array labels where the strings are dumped into upon insertion. Every node will then have a new field label_idx along with a label_len.

struct CompactTrie {
    nodes: Vec<TrieNode>,
    labels: Vec<u8>,
}

struct TrieNode {
    label_start: u32,
    first_child: u32,
    next_sibling: u32,
    label_len: u16,
    is_terminal: bool,
}

After rewriting the functions to use the new label system and rebuilding the trie, we’re left with a size of 118 MB. That’s less than half of the original size!

Current trie size: 118 MB  ■■■■■■■■■■■■■■■■■□□□□□□□□□□□□□□

Reducing the labels array size #

When building the radix trie, all of the text is naively appended into the labels array. This massively inflates the size of the array because some labels are duplicated multiple times.

After the trie is built, it’s possible to construct a new labels array that contains every label as a substring while being as small as possible in length. This is known as the shortest common superstring problem, and it’s NP-complete (similar to the traveling salesman problem). My friend Rasmus Kirvesmäki solved it with a greedy approximate algorithm which takes a reasonable time and reduces the labels array from 24 MB to just 2 MB!

Current trie size: 76 MB  ■■■■■■■■■■■□□□□□□

Reducing the size of `TrieNode` #

At this point, the TrieNode is 16 bytes in size (including padding). For the next optimization, we take advantage of the fact that the trie will be static after construction.

First of all, first_child doesn’t need the full 32 bits available to it since 23 bits (2^23 ≈ 8 million) is enough to represent all possible node indices. Same thing for the label_len, only 7 bits is enough. This means that they can be bit-packed into one u32, saving a couple bytes.

Since the trie will be static at runtime, the layout of the nodes array can be defined in such a way that every sibling node is stored sequentially in the array. If the first child of a node is at index i, the second will be at i+1 and so forth.

This allows us to get rid of the next_sibling field and instead just have a has_next_sibling flag. If the node has a sibling, it’s guaranteed to be in the next index over.

                     ┌───┐ ┌───┐   
                  ┌─▶│ 2 ├▶│ 4 │   
   ┌──────┐ ┌───┐ │  └───┘ └───┘   
   │ Root ├▶│ 1 ├─┤                
   └──────┘ └───┘ │  ┌───┐         
                  └─▶│ 3 │         
                     └───┘         
                                   
┌──────────┬────┬─────┬─────┬─────┐
│ Index    │  1 │  2  │  3  │  4  │
├──────────┼────┼─────┼─────┼─────┤
│ Label    │ RO │  M  │  W  │ AN  │
│          │    │     │     │     │
│ Terminal │  0 │  1  │  1  │  1  │
│          │    │     │     │     │
│ Child    │  2 │  4  │ -1  │ -1  │
│          │    │     │     │     │
│ Has next │  0 │  1  │  0  │  0  │
│ sibling  │    │     │     │     │
└──────────┴────┴─────┴─────┴─────┘

To make this work, you need to first build a regular radix trie. It will be used to construct a second, static radix trie by doing a breadth-first-traversal over the first one. This way, we can place every child of a given node next to each other in the nodes array.

Here is a CompactNode with some bit-packing applied:

/// Layout:
/// - label_start (4 bytes)
/// - packed (4 bytes):
///   - first_child: 23 bits (8M nodes max)
///   - label_len: 7 bits (127 chars max)
///   - is_terminal: 1 bit
///   - has_next_sibling: 1 bit
struct CompactNode {
    label_start: u32,
    packed: u32,
}

This one is only 8 bytes while the previous one was 16 bytes.

Current trie size: 50 MB  ■■■■■■■■□□□

The succinct LOUDS trie #

A 50 MB trie is not that bad, but what if I told you that it can be made 3 times smaller than that? Before going further, I need to explain what succinct data structures are.

As Wikipedia puts it, a succinct data structure is a “data structure which uses an amount of space that is ‘close’ to the information-theoretic lower bound but still allows for efficient query operations”. In practice, this means that if a data structure takes N bytes, the succinct one should take less than N, like sqrt(N) or log(N) extra space.

Most succinct data structures use a bit-vector which encodes information about the data in individual bits (0 or 1) and operates on the bit-vector using the following functions:

rank1(i), which counts the number of ones in the bit-vector up to i
select0(i) which gives the position of the ith zero in the bit-vector (from 0).

For example, if the bit-vector is 101101000, then rank1(2)=2. Similarly, select0(1)=4.

A naive implementation of rank1 would be to iterate through each bit of the bit-vector and count the number of 1s. Efficient implementations use an auxiliary structure such as Jacobson’s Rank to achieve O(1) queries while only using o(n) (sub-linear) extra space.

All tree structures can be represented in a succinct form. There are many different succinct representations for trees, but the best one for our case is LOUDS (Level-Order Unary Degree Sequence). That name may seem intimidating, but it’s actually the exact same idea we used previously when ordering the nodes level-by level!

In LOUDS, the tree is constructed as follows:

Traverse the trie in breadth-first order
For each node visited, store the number of children in the bit-vector by writing a 1for each child, followed by a 0. For example, a node with two children is110.
In another bit-vector terminals, write 1 if the node is terminal or 0 otherwise.
Also store data for each node in an auxiliary vector (the label_start and label_len in our case)

Here is the succinct representation of our example radix trie:

                    ┌───┐ ┌────┐
                 ┌─▶│ M ├▶│ AN │
┌──────┐  ┌────┐ │  └───┘ └────┘
│ Root ├─▶│ RO ├─┤   10    0    
└──────┘  └────┘ │  ┌───┐       
  10       110   └─▶│ W │     
                    └───┘       
                     0          
                                
 Succinct:  101101000           
 Terminals: 00111

Operations on the LOUDS trie #

One interesting property of the LOUDS trie is that indices are implicit: rank1(i) gives each node a unique ID from 1 to n+1, because every node is represented as exactly one 1 in the bit-vector. This ID can then be used to read data associated with the node.

The index of the first child of a node can be found using the following formula:

let first_child_idx = select0(rank1(node_idx) - 1) + 1;

The parent of a node can be found like this:

let parent_idx = select1(rank0(node_idx) - 1);

These are the only operations that are needed for predictive search and reverse lookup. The children of a node are stored next to each other as per the LOUDS construction. The number of children is calculated from counting the number of ones after node_idx in the bit-vector.

After switching to a LOUDS-based trie and doing some simple optimizations (splitting labels into simple, 1-character labels and complex ones), I managed to get the size down to around 24 MB. Rasmus did some further optimizations, creating a “palette” which contains label_start and label_len fields for the most common labels (which are duplicated a lot). Indices into label_palette are further split into 8 or 16 bit indices to compact them even more. This reduced the size to a measly 16 MB!

Here is the final struct definition. It’s quite a bit larger than previously, however it’s important to note that the *_select and *_rank structures don’t actually need to be serialized. They can be constructed cheaply at runtime from the bit-vectors.

pub struct LoudsTrie {
    bits: BitSlice,
    bits_select: succinct::select::BinSearchSelect<JacobsonRank<BitSlice>>,
    terminals: BitSlice,
    terminals_rank: JacobsonRank<BitSlice>,

    // Hybrid Palette Fields
    complex_labels_8: &[u8],   // Indices 0..255
    complex_labels_16: &[u16], // Indices 256..65535
    complex_labels_32: &[u32], // Stores direct (offset, len)
    complex_is_16: BitSlice,  // 1 = palette index, 0 = direct pointer
    complex_is_16_rank: JacobsonRank<BitSlice>,
    complex_is_8: BitSlice, // 1 = 8-bit index, 0 = 16-bit index
    complex_is_8_rank: JacobsonRank<BitSlice>,
    label_palette: &[u32], // The actual (offset, len) for the palette indices

    label_store: &[u8],
}

Current trie size: 16 MB  ■■□□□□□□□□□

Results #

Here you can see the autocomplete in action:

The funny thing about optimizing for size like this is that it also made the prefix search an order of magnitude faster. By measuring latency of WASM calls, I found out that the first version used to take around 17 milliseconds. The final version takes about 0.2 milliseconds.

Conclusion #

Here is a table of the trie types and sizes:

Trie type	Size
Naive flat+u32	279 MB
Radix	118 MB
Radix+dedup	76 MB
Level-order	50 MB
Succinct	16 MB

As you can see, the final trie is around 17x smaller than the initial attempt. It can be further reduced from 16 MB to 12 MB with gzip.

While this reduction in size is impressive, an important thing to note is that it still does not beat gzip compression. If you put every star name into a newline-separated text file, it’s about 66 MB, but if you gzip it, the resulting file is also 12 MB!

So what is the point if it’s going to be the same size over the wire regardless? Well, the point is the unpacked size. We’ve managed to get equal or faster autocomplete⁴ with a fraction of the memory usage. 66 MB vs. 16 MB matters a lot when memory is already constrained by eg. running Elite Dangerous at the same time. Also, if I were to double the number of stars, the plaintext size would also double in size while the trie would scale much slower.

This project took a large part of my winter break to complete, but it was really fun. The whole process was basically just making a file smaller by using more and more elaborate data structures and optimizations. I got to learn about succinct data structures, which are really cool. I hope that you learnt something too while reading this :)

You can find the final trie implementation on GitHub here, with around 1200 lines of code.

I have thought about cleaning it up and turning it into an NPM library and/or Rust crate. It would have a CLI tool which builds the trie file based on the given data and a WASM module which provides functions like suggest(prefix) based on that trie.

AI usage disclaimer #

This post was entirely human-written by me. Gemini 3 and Claude Sonnet 4.5 were used in writing the code.

Appendix #

DAWGs and FSTs #

Before switching to the succinct method, we started experimented with lots of ways to reduce the number of nodes. One idea was to share not only the prefixes, but also the suffixes.

           ┌────┐    ┌───┐                   ┌────┐    ┌───┐
         ┌▶│TEXT├───▶│ING│                 ┌▶│TEXT├─┬─▶│ING│
         │ └────┘    └───┘                 │ └────┘ │  └───┘
┌──────┐ │                        ┌──────┐ │        │       
│ Root │─┤           ┌───┐  ━━━━▶ │ Root │─┤        │       
└──────┘ │ ┌────┐ ┌─▶│ING│        └──────┘ │ ┌────┐ │       
         └▶│DRIV│─┤  └───┘                 └▶│DRIV│─┤       
           └────┘ │  ┌───┐                   └────┘ │  ┌───┐
                  └─▶│ E │                          └─▶│ E │
                     └───┘                             └───┘

Here, the “ING” in “TEXTING” and “DRIVING” can be shared. This can be generalized from single nodes to entire subtrees. Rasmus implemented this using a rolling hash for sub-tree equality, and managed to massively reduce the number of nodes, bringing the size down to 25 MB from 41 MB initially!

After doing a bit of research, we found out that this type of data structure has a name: the directed acyclic word graph, DAWG for short.

The difference from a DAWG and a trie is that tries are tree structures and DAWGs are not. Every child node in a tree structure has exactly one parent, which enables computing the full path from a given node to the root (reverse lookup). This is not possible with DAWGs because a child node may have multiple parents. The route from a given node to the root might be ambiguous. This is the reason why we couldn’t continue with DAWGs and had to revert back to using a trie.

A great thing about DAWGs is that they can be efficiently encoded as a “finite-state transducer”. There is a very good Rust crate that implements FSTs, called fst. It allows for some extreme compression on multi-gigabyte datasets, but for it’s not optimized for small datasets such as ours.

MARISA trie #

Space-efficient tries are not a completely unexplored field. The state of the art in trie seems to be the MARISA trie. It uses the same LOUDS construction, but it also stores the multi-byte labels recursively within another trie. A MARISA trie with the same star data is also 16 MB in size. It’s an impressive size for a general trie structure, and I am pretty confident that it trie could be made even smaller with some case-specific optimizations.

The reason why I didn’t end up using it is because it’s implemented as a C++ library, and therefore doesn’t integrate well into my Rust WASM pipeline. Also, it was a lot of fun to write our own trie implementation and how small we can make it!