If you truncate a UUID I will truncate your fingers

7 min read Original article ↗

“Oh I just needed something random but also human-readable” you said, as you casually called .Substring(8) on a UUID.

You probably also “casually” mutilate animals like you did to that poor UUID. Great job on that name, too, Shakespeare. Item_019b1999 is going to be the next buzzword all the youths are yelling. Very human-readable.

If it wasn’t bad enough, you kept the first eight characters and not the last. Do you even know what a UUID is? Let me show you.

Here’s the popular UUIDv7 everyone uses:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           unix_ts_ms                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          unix_ts_ms           |  ver  |       rand_a          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|                        rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

It’s a terrible diagram; so let me simplify that for you:

Image showing that the first 13 characters are not random; they are a timestamp

All of the randomness is that the end!

Okay, okay, the first 12 characters are actually an encoded Unix timestamp with millisecond precision, so only IDs generated within the resolution of that timestamp would collide. Let’s see what happens as you truncate the UUID.

LengthInterval where all IDs get the same valueRough human equivalent
121 msCamera flash
1116 msMonitor refresh
10256 msSlow mosquito flaps its wings
9≈ 4 sSound of a firework travels one mile
8≈ 1 minToweling off body after shower
7≈ 17.5 minCartoon show episode
6≈ 4.5 hrCook 20lb turkey
5≈ 3 dayRoof replacement
4≈ 50 dayOldest fruit fly
3≈ 2 yrParmesan cheese aging
2≈ 12.5 yrChinese zodiac cycles through all animals
1≈ 557 yrThe Ottoman empire

By truncating your UUID to 8 characters, you’ve ensured that all items generated while I was microwaving my rice have the same value. Congratulations for creating a nightmare.

Oh, but Andy, we use UUIDv4 where it’s all random

(Some libraries just call this a UUID but they really mean v4)

  1. Not true; five percent of it is not random, and
  2. You’re missing the point

The reason to use a UUID is for uniqueness. If you don’t want it, generate some random bits yourself.

As you truncate your UUIDv4, here’s how many IDs you can generate until you have a greater than 50% chance of a collision.

UUID Length (chars)Number generatedRough human intuition
322.7 QuintillionA third of all insects on earth (How???)
31680 QuadrillionAtoms of gold worth $0.33 ($4,300/oz)
30170 QaKg of mass to power the Sun for 1.3 yrs
2942 Qa260k years worth of parcels shipped (161B/yr)
2810.5 QaNanoseconds in 350 years
272.5 QaVolume of Lake Superior (gal)
26660 T2x the global real estate market ($)
25165 TData center energy usage in 2023 (kWh)
2441.5 TCells in a human body
2310 TTrees on 3.5 earths
222.5 TMeters to Uranus
21650 B136 years of orders at Amazon (9k/sec)
20160 BStars in the Milky Way
1940 BNeurons in a Gorilla
1810 BPeople on earth
172.5 BPing pong balls to fill 16 Olympic pools
161.3 BAll dogs
15316 MMPeople in USA
1479 MMTravelers passing through LAX each yea
1320 MM3 days of orders at Amazon (9k/sec)
1220 MM (yes)3 days of orders at Amazon (9k/sec)
115 MMLEGOs produced every 3 days
101.2 M30kg of white rice grains
9300 kMonster energy drinks sold every 4 hours
880 k9 minutes of orders at Amazon (9k/sec)
720 kPickleball courts built in 2024
65 kPack of staples
51.2 kStack of paper the height of a soda can
4300 Bag of Dum Dums lollipops
380People on a city bus
220Seconds for a human to urinate
15Your IQ if your truncate a UUID

So yeah, if you truncated your UUIDv4 to 8 characters at Amazon you’d probably get your PIP in 6 months instead of the 2 year average.

And this is just when the probability of generating UUIDs crosses the 50% threshold, which you would never even want to get close to. If you’re going to produce 100 billion UUIDs over the lifetime of your app (very realistic in modern enterprise), you want the probability of a collision to be disappearingly small, approaching 0.

For the sake of argument, let’s say that while you YOLO your way through life failing up the entire time, you decide that a 1% probability of collision is “good enough”. After truncating your UUID to 8 characters, you will hit a 1% chance of collision after just 9,300 IDs. That’s the number of steps a Spaniard takes in a day (three times as many as you).

Hell, even if you deigned to allow the UUID to retain half its original length (16 characters), you’d still have a 1% chance of collision after 150 MM, or the number of Snickers bars produced in 10 days.

And you’re still missing the point.

It’s not human readable anyway

  1. End users don’t give a shit what your IDs look like, they’re going to copy and paste if they ever need to (which is roughly never)
  2. Assumed user preference does not DICTATE HOW FUCKED YOUR DATABASE SHOULD BE

Use a different encoding

Did you know you can get shorter IDs without sacrificing uniqueness? UUIDs are hex-encoded (base 16) and only 4 bits of information can fit into each character. If you change your encoding to base 32, wow you can have an ID of length 26 instead of 32. If you used raw base 64, you could get down to 22 characters.

This is the idea behind the TypeID specification: a type-specific prefix followed by a base32 encoded UUID. Now you can generate IDs like Item_01kccskbjfff08mh2ttwpvjf9c which are equally human readable as before (meaning kind of but not really) without sacrificing the entire reason for its existence.

On human-readable IDs

Give up. Or at least give up on your end users easily remembering (or caring). Even random phrase generators that puke out something like “Parchment-Pellet-Closeable-Whoopee” only sound human readable at first. Without looking back, can you remember the alleged human-readable passphrase I just mentioned? Didn’t think so.

For you and all the coworkers that loathe you for truncating UUIDs, okay yeah maybe TypeIDs are helpful. These allow you to parse into stronger types like class FistId so that someone can’t accidentally use a FistId when they should have been using a FaceId. And it makes reading logs way easier.

Conclusion

Recently I reviewed some code that tried to cram an entire directory structure worth of IDs into one: {grandparent}_{parent}_{child}_{UUID}, except the entire string was only allowed to be 80 characters long. So it was truncated at the end. Fortunately (unfortunately?) two of the IDs were simple integers up to 8 digits, meaning our string could be 83 characters. So we were truncating the last 3 characters. Of the only part of the ID that made it unique in the first place. Turns out a simple UUID worked just fine and the other information could be gleaned from elsewhere.

For every character of a UUID you lop off, you are increasing the odds of a collision by four times (and the odds that I find you and lop off the end of your ring finger). It’s even worse if you’re using a UUID where the first few characters are an encoded timestamp.

Before you truncate a UUID ask yourself a few questions:

  • Are the extra few characters really making anything less readable?
  • Can I use a different encoding or a different kind of ID instead?

If you still decide you must truncate, then at least keep the last characters because they’re likely to give you better odds.

Until next time.