tokeydokey
Create random identifiers using a fixed number of non-overlapping LLM tokens.
Quick start
uv add tokeydokey uv run python - <<'PY' import tokeydokey print(tokeydokey.generate()) # e.g. "cache.Enable-Thread.sort" (4 by default) print(tokeydokey.generate(n=5)) # e.g. "db.Connection-Reset.queue.ready" PY
Development
uv sync --group dev uv run pytest
Regenerate pools
uv run python scripts/generate_pools.py uv run python scripts/generate_pools.py --encoding cl100k_base --out src/tokeydokey/_pools.py
Example pool math (o200k_base, dot/dash union)
Start pool N = 3.89×104 (alnum tokens), next pool M = 6.38×103 (".word" or "-word").
| Tokens | Combinations | Tokens | Combinations | Tokens | Combinations |
|---|---|---|---|---|---|
| 1 | 3.89×104 (~215) | 5 | 6.46×1019 (~266) | 9 | 1.07×1035 (~2116) |
| 2 | 2.49×108 (~228) | 6 | 4.12×1023 (~278) | 10 | 6.83×1038 (~2129) |
| 3 | 1.59×1012 (~241) | 7 | 2.63×1027 (~291) | 11 | 4.36×1042 (~2142) |
| 4 | 1.01×1016 (~253) | 8 | 1.68×1031 (~2104) | 12 | 2.78×1046 (~2154) |
Note: For ~128 bits of entropy, base64 needs 22 chars (132 bits) which average ~15.2 tokens in o200k_base; dot/dash union needs ~10 tokens. This is roughly 50% more token-efficient than random base64 identifiers.
Alternatives considered
- CamelTitle (Titlecase 2-12 chars): pool size 8,482, 100% compatible for concatenation.
- Word/(Word+Number) alternating: union pool size 9,482 (adds 0-999), 100% compatible.
- Dot-only: next pool 4,410, 100% compatible.
- Base62: around 8.6 bits per token in o200k_base; token count varies.
Attribution
Token pool data in src/tokeydokey/_pools.py is derived from the tiktoken
vocabulary. See NOTICE for details.
License
MIT