Congratulations to
@MistralAIon reducing latency (time to first token) by up to 10X on their API! Mistral have reduced latency (on their own API) from up to 11 seconds down to sub-1 second. This is especially important as Mistral’s API is the only place to get Mistral Medium - potentially the highest quality available model other than GPT-4. Artificial Analysis now assesses Mistral Medium to be viable for use in production performance-sensitive applications, with consistent throughput of ~21 tokens/s and latency of ~0.5 seconds. We benchmark 8 times per day for all supported LLM APIs. For more details, check out artificialanalysis.ai/models/mistral…