ExBoxing: Bridging the divide between tag boxing and NaN boxing

I spent about a decade working with the SpiderMonkey team at Mozilla, and during that time there were a number of VM design ideas I had which I never got the chance to implement.

This is one of them.

Summary

ExBoxing is an approach to value-boxing I developed to address some of the deficiencies between low-bit tagging and NaN-boxing — value boxing techniques used by modern JS virtual machines.

Here, NaN-boxing (used by Mozilla for SpiderMonkey) has the advantage that it can represent all numeric Float64 values directly as immediates. However, the approach comes with numerous downsides.

Low-bit tagging is advantageous to NaN-boxing in many ways, but lacks the ability to represent Float64 values as immediates — forcing heap allocation for them.

ExBoxing is a technique that allows for the representation of the most commonly used Float64 values as immediates, while still using low-bit tagging to distinguish value types.

Tag Boxing

Tag boxing is the more commonly known technique so let’s discuss that first.

Tag boxing leverages the fact that pointers aligned to power-of-two addresses have a certain number of low bits that are guaranteed to be 0. If the VM can guarantee that all heap-object pointers satisfy a particular alignment, then the low bits of a pointer can be used to store a type tag, allowing for distinguishing between “real pointers” and “other encoded values”.

The simplest approach, used by V8, is to use a single low-bit to distinguish between pointers and integers (the most common high-cardinality primitive type). Integer values that fit within the payload bits of the boxed word can be encoded as immediates. Anything else is allocated on the heap and encoded as a pointer to that allocation.

[1] Tag Boxing Diagram

Tag boxing has a lot of nice properties. It doesn’t depend on any architectural property of the system CPU. An aligned pointer on any system will always have some number of low bits that are 0, and allocation alignment can be controlled on any system (in particular increased as much as is needed to ensure the appropriate number of low 0 bits).

Furthermore, testing for type tags encoded in the low bits is easy. On X86–64, this can be encoded as a single “and” operation with an immediate operand encoding the bit-mask.

Lastly, the approach uses only a few bits of the encoding word. For a 64-bit value-boxed word, using a single tag bit leaves 63 bits free for other value representations. If more type variants are desired, as few as 3 type bits are able to directly encode 8 types — leaving 61 bits free for value encoding.

All of this comes with the price that we cannot represent Float64 values as immediates.

Why is It Hard to Encode Floats as Immediates

It behooves us to understand why we can encode some integers as immediates, but not floats. With integers, we’re taking advantage of the fact that in practice, most integer values that flow through programs are relatively “small” — having far fewer than 64 significant bits.

This allows us to select a “common subset” of integers with a smaller significant bit count, and encode those in the payload of our value-boxed word. Of course, this forces us to allocate any integer that falls outside of this range as a heap-allocated object, but this tends to be extremely rare in practice.

The problem with Float64 is that there isn’t a common subset that easily maps to a range of bit-sequences where the high bits are insignificant. Every bit in a float value kind of matters, or so it seems.

NaN Boxing

NaN boxing is used by Spidermonkey, the JS engine inside Firefox. I haven’t seen it used anywhere else.

Its primary benefit it providesis the ability to represent all Float64 values as immediates.

IEEE Float64 Representation

This is accomplished by using the 52-bit “NaN space” that exists in the IEEE spec for Float64 encodings.

IEEE Float64 representation

Any Float64 where all the exponent bits are set, and where the mantissa is a non-zero value, is a NaN value. This allows us to use those bits as an encoding space for both type tags and payloads. There is some care we need to maintain, as at least one number from the 52-bit space has to be reserved to represent a canonical NaN float64, but for the most part we can proceed with a tag + payload approach:

NaN-boxed value representation

However it comes with many downsides.

The biggest problem is that the technique forces a dependency on address space size. In order to encode pointers, we have to rely on the assumption that at least the high 13 bits of the pointer carry no information. This is true on current 64-bit architectures (where pointers only have 48 significant bits), but is a tenuous assumption that cannot be relied upon for the long term.

Another issue is that it uses many bits for type representation (13 + tag bits), and leaves very few bits for value representation (≤ 51 bits).

Lastly, using the high bits of a word to store type tags is cumbersome on some architectures that don’t support easy bit-selection instructions. For example on X86–64, unboxing such a value demands choosing between different compromised solutions:

Use a large constant to mask the type bits directly and compare against a large constant directly.
Shift the high-bits down to low bits, use a small constant to mask the type bits, and compare against a small constant directly.

The first option on X86–64 forces the use of 64-bit immediates, which are not well supported in the ISA. In particular, masking a 64-bit value with a 64-bit immediate requires loading the immediate into a register.

The second option forces a data-dependency as the mask operation now needs the result of the right-shift operation that brings the type bits down.

ExBoxing

ExBoxing, short for exponent boxing, brings the benefits of immediate Float64 encoding to the tagged boxing approach.

The key observation it makes is that most common floats will have exponents with small magnitudes.

Anecdotally, I ran a measurement where I dumped the Float64 values that flow through all of the javascript on a site, and generally browsed around the web, and encountered no Float64s with exponents outside the range [-32, 32].

If we restrict ourselves to a smaller range of exponents, we find that the high bits of the exponent regularize to one of two patterns: 011... and 100....

For example, all Float64s with exponents in the range [-254, 255] will have their exponent high bits begin with 011 or 100. If we can get these patterns into the low bits of a representation, then we can use them as type tags to represent all float64s within a particular type range as immediates.

IEEE Float64s with Restricted Exponents

Getting these high exponent bits into the low bits of a word is easy enough: we can rotate left by 1 + N to get N high bits from the exponent into the low bits of the word.

Rotating a Float64 left to bring exponent bits down

With this transform, we now have access to a subset of Float64s (the most commonly used subset in fact) we can use with low-bit tagging schemes. We don’t even have to use the exponent high bits directly as tags — as long as we have a way of distinguishing two exponent variants from the type tagging scheme, we can move forward.

So for example, we could imagine a tag boxing scheme which uses the low 3 bits of a word to encode types, with the following type assignments:

HeapPointer — 000
HeapInteger — 001
HeapFloat64 — 010
ImmInt61 — 100
ImmFloatLow — 101
ImmFloatHigh — 110

Under this scheme, testing for an immediate float64 would simply involve checking for either of the type tags, and unboxing an immediate float64 would involve adjusting the type tags back to the corresponding exponent bits and then rotating the Float64 back into place

const EXP_ADJUST: u64 = 0b101 - 0b011;
fn unboxImmFloat64(word: u64) -> f64 {
  f64::from_bits((word - EXP_ADJUST).rotate_right(TAG_BITS + 1))
}

And with that, we can get the best of both worlds: low-bit tagging for value-boxing, and the ability to represent most of the float64s we care about as immediates.

Unrelated Nerd Sniping — Immediate Strings

And if we’re gonna use 3 tag bits, and maybe have up to 8 type tags available, maybe we can use a couple of them to store short strings as immediates too? Ruby does this but it uses 128-bit boxed values. There’s no reason we can’t do it with low-bit tagging on VMs that use 64-bit value-boxed representations. A tag bits and a length field of up to 7 can e encoded in less than 8 bits, leaving the remaining 56 bits for encoding immediate string representations.

Immediate String Representation

That would remove a large number of common strings — especially those associated with standard library API names, from the heap and make them immediates. It would reduce memory usage and also help the garbage collector by reducing the heap object count. It would also eliminate the memory-indirection overhead and cache-line pollution caused by .length operations on strings (since those can be computed from the immediate).

Final Thoughts

I’d love to see someone implement this in their VM. I wanted to do it for SpiderMonkey to prepare it for a future with >48 bits of address space, but never got the time to prioritize it among other projects (it’s a lot of work).

If you do end up using the technique for your VM, lemme know how it goes.