GitHub - blixt/tokeydokey: Create random identifiers using a fixed number of non-overlapping LLM tokens.

2 min read Original article ↗

tokeydokey

Create random identifiers using a fixed number of non-overlapping LLM tokens.

Quick start

uv add tokeydokey
uv run python - <<'PY'
import tokeydokey

print(tokeydokey.generate())
# e.g. "cache.Enable-Thread.sort" (4 by default)
print(tokeydokey.generate(n=5))
# e.g. "db.Connection-Reset.queue.ready"
PY

Development

uv sync --group dev
uv run pytest

Regenerate pools

uv run python scripts/generate_pools.py
uv run python scripts/generate_pools.py --encoding cl100k_base --out src/tokeydokey/_pools.py

Example pool math (o200k_base, dot/dash union)

Start pool N = 3.89×104 (alnum tokens), next pool M = 6.38×103 (".word" or "-word").

Tokens Combinations Tokens Combinations Tokens Combinations
1 3.89×104 (~215) 5 6.46×1019 (~266) 9 1.07×1035 (~2116)
2 2.49×108 (~228) 6 4.12×1023 (~278) 10 6.83×1038 (~2129)
3 1.59×1012 (~241) 7 2.63×1027 (~291) 11 4.36×1042 (~2142)
4 1.01×1016 (~253) 8 1.68×1031 (~2104) 12 2.78×1046 (~2154)

Note: For ~128 bits of entropy, base64 needs 22 chars (132 bits) which average ~15.2 tokens in o200k_base; dot/dash union needs ~10 tokens. This is roughly 50% more token-efficient than random base64 identifiers.

Alternatives considered

  • CamelTitle (Titlecase 2-12 chars): pool size 8,482, 100% compatible for concatenation.
  • Word/(Word+Number) alternating: union pool size 9,482 (adds 0-999), 100% compatible.
  • Dot-only: next pool 4,410, 100% compatible.
  • Base62: around 8.6 bits per token in o200k_base; token count varies.

Attribution

Token pool data in src/tokeydokey/_pools.py is derived from the tiktoken vocabulary. See NOTICE for details.

License

MIT