Ask HN: How should I create a unique id for entries that aren't incremental?
For example, right now when I'm creating boards (agile), it will create a new board and its id will be n + 1.
What is an efficient way of creating an ID where there won't be any collision even if there are 1 billion+ entries?
This ID will be used in a url..
Thanks,
Tim "where there won't be any collision even if there are 1 billion+ entries" This is a really complicated topic, and there are multiple ways to handle what you're doing. It really depends on your read/write ratios, typical volume, growth rate, and the underlying DB software you're using. Because there are so many considerations that require knowing real-world use cases, it's a premature optimization. Are you going to have more than 1 billion records in the next few years? If not, don't worry about this. However, there are other reasons to use non-incremental IDs (security, for one). To answer your question as asked though, check this out: http://www.postgresql.org/docs/8.3/static/datatype-uuid.html Hmm, well right now users are complaining it's too easy to view other people's boards because the url is https://www.taskfort.com/view/10 The only way to not view a person's board is if it's private.. There are some services that for their pages will have id's that are 7 or so characters long, and very compact, the uuid you're referencing seems kind of ugly. I would still keep my incremental ID in the table as a PK, but maybe I could generate a new value per row for a public URL ID. That public url id could be based off of their PK but I don't know what would be the best way to generate a short url id w/ the PK as a key. This looks like a good implementation: http://kvz.io/blog/2009/06/10/create-short-ids-with-php-like... > However, there are other reasons to use non-incremental IDs (security, for one). That's just security by obscurity, with proper authorization checking it doesn't matter. Security doesn't always mean "seeing something you're not supposed to see". He's saying that the boards are public, so people are able to just change the number at the end of the URL to find them all. You can have the same issue with scrapers. It's much easier for scrapers to get all your pages if you use sequential numbers for unique IDs. Yes, a search engine could index the pages, but the big engines will obey your robots.txt, and the small engines will never know that you exist most likely. So s/he's not trying to "secure" anything as much as just hide it. Security by obscurity isn't a bad practice, it just shouldn't be your main practice. Absolutely design your system with the assumption that the attacker has complete access to all information about your setup, but it's still reasonable to try to obscure as many of those details as possible. Your setup will have flaws and you will make mistakes, so you want to try to minimize the damage those mistakes might cause and increase the time/effort needed to exploit them. Incremental IDs work best, but if you want you can hash a UUID which will work for your use case: % uuidgen B14818B6-4219-43BD-82EF-8421EC1AFBCF % echo "B14818B6-4219-43BD-82EF-8421EC1AFBCF" | shasum -a 256 00ea501d47789ac5eb559f10d631b3f6df8f82b5cba9c1f9d234b705d89f1704 Those urls are kind of ugly. If this helps at all, maybe I could create a public url id based off the incremental PK id's. How about hashing the incremental?
Now I wonder how ids like imgur or youtube work. Base62 encoded incremental IDs That's still incremental though. It just looks different. bigint gives you 9.2 quintillion options before you run into a collision. Which is obviously not forever proofed, but certainly future proofed. OP isn't asking for a data type that can hold more than a billion records. OP needs a way to generate a random ID without checking that the ID has already been used -- s/he wants a UUID. Bigint is FAR too small to do that a billion times without a collision.