Show HN: ai-tokenizer – 5-7x faster than tiktoken with AI SDK support

coder.github.io

1 points by kylecarbs 7 months ago · 0 comments · 1 min read

Reader

Hey HN! I built an AI tokenizer that's 5-7x faster than tiktoken using pure JavaScript - no WebAssembly needed.

Born from frustration with existing packages:

- Existing libraries don't support AI SDK messages and tools. Tools consume massive tokens (549 for adding a basic Claude tool [1]), but there's no way to count them. ai-tokenizer has native AI SDK support with per-tool breakdowns.

- Most models don't publish exact tokenizers. We run real API calls at build-time to find the most accurate public BPE tokenizer for each model, then apply calibration weights to achieve 97-99% accuracy [2].

- WebAssembly isn't necessary for great performance and reduces portability. ai-tokenizer precompiles BPE vocabularies into optimized hashmaps, achieving 5-7x faster performance than tiktoken [3].

Live Demo: https://coder.github.io/ai-tokenizer

Repository: https://github.com/coder/ai-tokenizer

[1]: https://github.com/coder/ai-tokenizer/blob/main/src/models.j...

[2]: https://github.com/coder/ai-tokenizer/blob/main/scripts/find...

[3]: https://github.com/coder/ai-tokenizer#performance

No comments yet.

Settings

Show HN: ai-tokenizer – 5-7x faster than tiktoken with AI SDK support

Keyboard Shortcuts