Tokenizer UI for Mistral and Claude

2 points by henry_pulver 2 years ago · 3 comments

Reader

I use the OpenAI tokenizer UI a lot when prompt engineering.

Token count for inputs allows comparison of different data formats (YAML, JSON, TS) and is a crude measure of prompt importance weighting. For outputs it is a relative measure of output speed between prompts (tok/s varies by time of day) and a crude measure of compute used in outputs (why “Think step-by-step” works). Token count also determines the cost of a prompt.

Since there’s no equivalent for other providers, I built one for Mistral & Anthropic. If it’s useful, I can add other providers too - let me know which you’d like.

Zambyte 2 years ago

Thanks for building this. Are the tokens different for the different models? For example, will the Mistral tokenization apply for both the 7B open model, and their propriety API only models?
- henry_pulverOP 2 years ago
  
  On the tokenizers Mistral use for proprietary models, this isn't common knowledge.
  This tokenizer is correct for the 7B open model and 8x7B MoE model. It'll probably be the closest to the ones their proprietary API-only models use

Settings

Tokenizer UI for Mistral and Claude

Keyboard Shortcuts