GitHub - blixt/tokeydokey: Create random identifiers using a fixed number of non-overlapping LLM tokens.

tokeydokey

Create random identifiers using a fixed number of non-overlapping LLM tokens.

Quick start

uv add tokeydokey
uv run python - <<'PY'
import tokeydokey

print(tokeydokey.generate())
# e.g. "cache.Enable-Thread.sort" (4 by default)
print(tokeydokey.generate(n=5))
# e.g. "db.Connection-Reset.queue.ready"
PY

Development

uv sync --group dev
uv run pytest

Regenerate pools

uv run python scripts/generate_pools.py
uv run python scripts/generate_pools.py --encoding cl100k_base --out src/tokeydokey/_pools.py

Example pool math (o200k_base, dot/dash union)

Start pool N = 3.89×10⁴ (alnum tokens), next pool M = 6.38×10³ (".word" or "-word").

Tokens	Combinations	Tokens	Combinations	Tokens	Combinations
1	3.89×10⁴ (~2¹⁵)	5	6.46×10¹⁹ (~2⁶⁶)	9	1.07×10³⁵ (~2¹¹⁶)
2	2.49×10⁸ (~2²⁸)	6	4.12×10²³ (~2⁷⁸)	10	6.83×10³⁸ (~2¹²⁹)
3	1.59×10¹² (~2⁴¹)	7	2.63×10²⁷ (~2⁹¹)	11	4.36×10⁴² (~2¹⁴²)
4	1.01×10¹⁶ (~2⁵³)	8	1.68×10³¹ (~2¹⁰⁴)	12	2.78×10⁴⁶ (~2¹⁵⁴)

Note: For ~128 bits of entropy, base64 needs 22 chars (132 bits) which average ~15.2 tokens in o200k_base; dot/dash union needs ~10 tokens. This is roughly 50% more token-efficient than random base64 identifiers.

Alternatives considered

CamelTitle (Titlecase 2-12 chars): pool size 8,482, 100% compatible for concatenation.
Word/(Word+Number) alternating: union pool size 9,482 (adds 0-999), 100% compatible.
Dot-only: next pool 4,410, 100% compatible.
Base62: around 8.6 bits per token in o200k_base; token count varies.

Attribution

Token pool data in src/tokeydokey/_pools.py is derived from the tiktoken vocabulary. See NOTICE for details.

License

MIT