Against the UUID

8 points by firasd 4 months ago · 17 comments

Reader

> A UUID is a collection of random characters like f81d4fae-7dec-11d0-a765-00a0c91e6bf6

This statement is just wrong. More accurately, A UUID -can be- a 128 bit number made up of mostly random bits (UUIDv4) which can be represented as a string using a common representation. Sure, that may well be a common case for most, but there are very completely non-random UUID versions as well, such as UUIDv1 or, more recently, the somewhat random UUIDv7.

The author's proposed ID system is less defensible against some of the other UUID versions which aren't addressed. But that initial description suggests that the author doesn't have a complete enough understanding of UUIDs to really make a credible case about UUID problems (which do exist) vs. their own proposed system.

mikl 4 months ago

This exceedingly uninformed rant makes more sense when you understand that the author has his own timestamp format he wants to push.

Counterpoints:

- UUID is not random characters, it's a 128 bit number and is stored as such in many databases. It can be presented as a hex-string with dash-separators, but it doesn't have to be.

- There are several types of UUIDs. UUIDv4 is mostly just random bits. Others have time and machine numbering, like Snowflake IDs. UUIDv7 has a combination of time and randomness.

- UUIDv7 was made to address the database index problem, rendering that point moot.

Lots of tools understand and/or support UUIDs.

You can complain about the overhead of storing and indexing 128 bit numbers if you want, but realize that a string like 2025_P5U5_326662 is likely also going to be stored as 128 bits. And the added value of having the year in front (the rest is not going to mean anything to the average user) is not that great.

taylodl 4 months ago

Use UUIDv7 and be done with it. It solves your database indexing problem, too.

mort96 4 months ago

Except that it leaks the creation timestamp. You really have to think hard about: 1) are IDs of resources created by one person (such as their account, uploaded images, whatever) accessible to another person? and 2) does it matter that you leak the creation time?
In some situations, it doesn't matter; creation times are always intentionally leaked anyway. For example, here on HN, I can just click your username and see that you've been a user since 2012, and all your posts are publicly timestamped. That's not sensitive info, so accounts and comments on HN could use UUIDv7. But there are other times where we may not want everyone to see the account creation time of everyone else.
- taylodl 4 months ago
  
  If your system needs to prevent leakage of creation time - typically due to regulatory or privacy requirements - you can use AES-SIV to encrypt UUIDv7 identifiers.
  All APIs and URLs would expose only the encrypted form of the ID. When performing record lookups, your data tier would decrypt the ID to retrieve the original UUIDv7. This preserves the indexing and sortability benefits of UUIDv7 while mitigating timestamp leakage.
  This approach is compatible with technologies like GraphQL, but you’ll need to treat the encrypted ID as an opaque string in your schema. Your resolvers will decrypt the ID before querying the database. Just ensure that:
  - The encryption is deterministic (so the same UUIDv7 always maps to the same encrypted ID).
  - You handle decryption securely and efficiently in your resolver logic.
  This pattern works well for systems where external exposure of creation time is undesirable, but internal sortability and uniqueness are still critical.
  - mort96 4 months ago
    
    This works, but now you have a very sensitive secret to keep track of. Losing that secret immediately destroys your database (you can't just create a new one like you would if you e.g lost an API key for a third party service), and leaking the key leaks all creation dates. For lots of use cases, this UUID secret would be the most critical secret of all; and for lots of other use cases, it would be the only secret.
    Plus, the UUIDs obviously wouldn't be encrypted at rest, so all creation dates would leak if you had any kind of database leak (although you could argue that leaking creation timestamps would be the least of your concerns if you had a DB leak).
    So it's a valid solution for sure, but certainly not without trade-offs.
    
    taylodl 4 months ago
    
    The encryption key in an AES-SIV-based UUIDv7 scheme would indeed become a critical secret, and its compromise could expose creation timestamps. That said, I’d like to clarify a few points:
    Database Integrity: Losing the encryption key does not destroy the database. The core data remains intact. What’s affected is the ability to interpret the encrypted timestamp metadata. External systems relying on those timestamps may encounter decryption errors if using an outdated key, but this is a deliberate safeguard. Rather than returning incorrect data, the system would fail securely - prompting the external system to re-retrieve the record or fall back to alternative metadata.
    Key Rotation as a Mitigation Strategy: Regular key rotation is a best practice in cryptographic systems and is especially important here. By rotating keys periodically and tagging external identifiers with a key identifier, we can scope the impact of any key compromise. The key identifier is not embedded in the UUIDv7 itself, nor is it visible externally. Instead, it is used internally - either as metadata or as part of the encryption context in AES-SIV - to ensure the correct key and cipher configuration are applied during decryption. Older keys can be retained for decryption during a grace period and then retired. AES-SIV’s misuse resistance ensures that decryption with the wrong key or context fails securely, preventing silent errors.
    Risk Context: If the database were compromised, timestamp leakage would likely be among the least critical concerns. Encrypting timestamps helps mitigate passive metadata leakage in lower-risk scenarios (e.g., internal misuse or partial leaks), not full-blown breaches.
    Use Case Sensitivity: This approach isn’t universally applicable. For systems where timestamp inference is low-risk, the added complexity may not be justified. But for privacy-sensitive environments - where timestamp leakage could enable profiling or behavioral inference - AES-SIV with key rotation offers a strong balance of security and operational resilience.
    While the encryption key is sensitive, the system can be designed to handle key rotation gracefully, fail securely, and preserve data integrity even in the face of key loss or compromise. As you can tell, I've had to worry about this stuff!
    
    mort96 4 months ago
    
    Your comment reads a lot like a ChatGPT-generated counter argument, so I'll be brief, in case I'm wasting my time arguing with a bot.
    1. The database remains readable if you lose the key, but all IDs which have been given out to clients will become invalid. This can be anywhere between an inconvenience and a catastrophe. The statement "What’s affected is the ability to interpret the encrypted timestamp metadata" is completely wrong: there is no "encrypted timestamp metadata", what's lost is the ability to translate IDs given out to clients into IDs in the database. Very ChatGPT-like mistake to make.
    2. The moment a key is leaked, all IDs which have been handed out to clients so far since the last key rotation suddenly reveals tinestamps.
    3. The statement "the database were compromised, timestamp leakage would likely be among the least critical concerns" just repeats what I said in my comment.
    4. This point also literally repeats what I wrote as if it's an argument against what I wrote.
    Please have the decency of writing your arguments yourself, this is ridiculous.
    
    taylodl 4 months ago
    
    Not a bot but hey, thanks for the complement?
    1. Clients can't expect to cache data forever and it still be correct. Think of what they've been provided is an "access code" to the information they want. That access code will expire. They have a means for getting a new code. Your data integrity is preserved. You can manually force the expiration of that access code at any time operations requires.
    2. If the DB hasn't been leaked then all that's happening is the clients can see the timestamps of their data, which presumably you were going to display to them at some point?
    3. Yes, I was affirming your point.
    4. It's a clarification of what you wrote, especially the part where you said "losing that secret immediately destroys your database." No, it doesn't. Your database is intact. Someone may gain the ability to determine the timestamps for their data, as said in (2), but as long as there wasn't a widespread DB leak, then you're fine. We've already established if you have a DB leak then all bets are off. Even then, there are ways to mitigate the impact of such a leak.
    I do all my own writing. AI edits the results. Sorry not sorry if you don't like the modern workflow.
    
    mort96 4 months ago
    
    > AI edits the results.
    That makes you sound like a bot.
    I'm done talking to bots. This conversation is over.
    
    taylodl 4 months ago
    
    Correction, you're talking to someone with 40 years of experience building systems that people’s lives and money depend on. This conversation was never exclusively about you as there are plenty of HN readers lurking and learning. This is just one way for an old, grizzled geezer to share what he’s learned. Sure, I could write blog posts, but that’s too formal and hardly anyone reads those anyway.

PaulHoule 4 months ago

Part of the vision of the semantic web is that you can take random data from two random sources and load it into a SPARQL store and start writing queries, URI namespaces help, but if people use truly random identifiers it ‘just works’ and there is nothing paranoid about it.

firasdOP 4 months ago

Well isn't Wikipedia one of the biggest SPARQL-queryable DBs out there? They use 'slug' based IDs
I don't oppose randomization but what I say is that Time is entropy. Use it as a prefix. And UUIDs v7 which uses time is still too long and ugly.
WordPress powers half the web and there are no UUIDs in its URLs... so in practice using UUID is just not aesthetically pleasing for URLs
Postscript: And in fact LLMs seem to be better at fulfilling the Tim Berners Lee and Bill Gates dreams of universal interop. They can just say "oh here's the weather in Amsterdam according to this json" without any rigid interop ID or protocol
- PaulHoule 4 months ago
  
  ... And be right 85% of the time.
  Back in the 1990s there was that sordid episode when Microsoft Office used those other UUIDs which were based on time and a MAC address so any Office document could be tracked to what computer you used and when.
  In my lost half-decade I was pursuing the dream of "Real Semantics", which, in retrospect, was a baby Cyc, not so much a master knowledge base but a system for maintaining knowledge bases for other systems that could hold several parallel knowledge bases at once. I read everything public about Cyc and also tried using DBpedia and Freebase as a prototype "master" database. Lenat strongly believed you'd get in trouble pretty quickly if you tried to make non-opaque identifiers but Wikipedia has done a pretty good job of it for 7M items, with the caveat that Wikipedia doesn't have a consistent level of modelling granularity (e.g. it is so much easier for a video game to be notable than a book, some artists have all their major songs in Wikipedia, others don't, there is no such thing as a "Ford Thunderbird" but there is a "7th generation Ford Thunderbird", etc.)

luismedel 4 months ago

I stopped using UUIDs a while ago and I'm happy with TSIDs[0]. Shorter, sortable and can add semantic meaning to ids.

https://github.com/luismedel/tsid-python

Faaak 4 months ago

I don't know how I really feel about having a Github Action that updates the Readme of the git repo in order to have a "somehow realtime Alphadec". So many commits, so many wasted resources :'(

Settings

Against the UUID

Keyboard Shortcuts