Scribe v2 Realtime Speech to Text - 150ms Latency API

1 min read Original article ↗

Transcribe live speech instantly

Scribe v2 Realtime is the most accurate real-time transcription model with 150ms latency across 90+ languages. Available via API.

Introducing Scribe v2 Realtime, built for speed and accuracy

Ultra-fast, ultra-accurate, and built for live speech. Scribe v2 Realtime delivers instant transcription for agents, meetings, and conversational AI.

High Accuracy

Trained on diverse global data and fine-tuned for natural speech, Scribe achieves industry-best Word Error Rates across major languages and accents.

Scribe beats all competing models in accuracy benchmarks

Ultra-low Latency

Stream audio and receive transcriptions in ~150 ms, enabling real-time understanding for live agents, meetings, and conversational AI.

Real-time speech for agents, apps, and every language

Speech recognition engineered for real-time performance

Enterprise-grade security and infrastructure at scale

Unmatched accuracy, even in the most complex environments

Built for every workflow, from agents to production

Flexible pricing based on your needs

Experience best-in-class accuracy and responsiveness with pricing designed to scale from startups to enterprise teams.

$0.28 per hour & lower

on annual Business plans

UI Screenshot

Frequently asked questions

Latest updates

Create with the highest quality AI Audio