Congratulations to @MistralAI on reducing latency (time to first token) by up to 10X on their API! Mistral have reduced latency (on their own API) from up to 11 seconds down to sub-1 second. This is especially important as Mistral’s API is the only place to get Mistral Medium https://t.co/bRzqZTaUyV

1 min read Original article ↗

Congratulations to

@MistralAI

on reducing latency (time to first token) by up to 10X on their API! Mistral have reduced latency (on their own API) from up to 11 seconds down to sub-1 second. This is especially important as Mistral’s API is the only place to get Mistral Medium - potentially the highest quality available model other than GPT-4. Artificial Analysis now assesses Mistral Medium to be viable for use in production performance-sensitive applications, with consistent throughput of ~21 tokens/s and latency of ~0.5 seconds. We benchmark 8 times per day for all supported LLM APIs. For more details, check out artificialanalysis.ai/models/mistral…