Show HN: ai-tokenizer – 5-7x faster than tiktoken with AI SDK support
coder.github.ioHey HN! I built an AI tokenizer that's 5-7x faster than tiktoken using pure JavaScript - no WebAssembly needed.
Born from frustration with existing packages:
- Existing libraries don't support AI SDK messages and tools. Tools consume massive tokens (549 for adding a basic Claude tool [1]), but there's no way to count them. ai-tokenizer has native AI SDK support with per-tool breakdowns.
- Most models don't publish exact tokenizers. We run real API calls at build-time to find the most accurate public BPE tokenizer for each model, then apply calibration weights to achieve 97-99% accuracy [2].
- WebAssembly isn't necessary for great performance and reduces portability. ai-tokenizer precompiles BPE vocabularies into optimized hashmaps, achieving 5-7x faster performance than tiktoken [3].
Live Demo: https://coder.github.io/ai-tokenizer
Repository: https://github.com/coder/ai-tokenizer
[1]: https://github.com/coder/ai-tokenizer/blob/main/src/models.j...
[2]: https://github.com/coder/ai-tokenizer/blob/main/scripts/find...
[3]: https://github.com/coder/ai-tokenizer#performance
No comments yet.