Settings

Theme

Show HN: Create LLM-optimized random identifiers

github.com

2 points by blixt 20 days ago · 2 comments · 1 min read

Reader

I went exploring whether using LLM tokens as the individual "digits" of random ids would let you get more randomness for the same number of tokens, and the answer is yes. Using the current strategy in this library is about ~50% more token efficient than using base64 ids.

I also ran hundreds of sessions against the OpenAI API to see if the logprobs would look off using this strategy, compared to base64 ids, and it seems like it's about the same or possibly slightly better (more "peaky").

Could be useful for agentic frameworks where tool results need to provide ids to refer back to later. A small win at best, but it was fun to explore!

anonymoushn 20 days ago

what does "logprobs look off" mean

  • blixtOP 20 days ago

    If the immediate next token probabilities are flat, that would mean the LLM is not able to predict the next token with any certainty. This might happen if an LLM is thrown off by out of distribution data, though I haven't personally seen it happen with modern models, so it was mostly a sanity check. But examples from the past that would cause this have been simple things like not normalizing token boundaries in your input, trailing whitespace, etc. And sometimes using very rare tokens AKA "glitch tokens" (https://en.wikipedia.org/wiki/Glitch_token).

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection