Show HN: Create LLM-optimized random identifiers

2 points by blixt 2 months ago · 2 comments · 1 min read

Reader

I went exploring whether using LLM tokens as the individual "digits" of random ids would let you get more randomness for the same number of tokens, and the answer is yes. Using the current strategy in this library is about ~50% more token efficient than using base64 ids.

I also ran hundreds of sessions against the OpenAI API to see if the logprobs would look off using this strategy, compared to base64 ids, and it seems like it's about the same or possibly slightly better (more "peaky").

Could be useful for agentic frameworks where tool results need to provide ids to refer back to later. A small win at best, but it was fun to explore!

anonymoushn 2 months ago

what does "logprobs look off" mean

blixtOP 2 months ago

If the immediate next token probabilities are flat, that would mean the LLM is not able to predict the next token with any certainty. This might happen if an LLM is thrown off by out of distribution data, though I haven't personally seen it happen with modern models, so it was mostly a sanity check. But examples from the past that would cause this have been simple things like not normalizing token boundaries in your input, trailing whitespace, etc. And sometimes using very rare tokens AKA "glitch tokens" (https://en.wikipedia.org/wiki/Glitch_token).

Settings

Show HN: Create LLM-optimized random identifiers

Keyboard Shortcuts