Voice Agent API
Multilingual voice agents over WebSocket with native tool calling, MCP support, and web search.
Text to Speech API
Choose from five distinct voices and multiple audio formats. Built for telephony and web.
Speech to Text API
Top-ranked in blind human evaluations across benchmarked languages. Handles accents and domain-specific terminology. Supports batch, streaming, and bidirectional modes.
| API | Price | Rate Limits |
|---|---|---|
Voice Agent API Real-time voice conversations over WebSocket | $0.05 / min($3.00 / hr) | 100 sessions / team |
Text to SpeechBeta Convert text to natural speech | $4.20 / 1M characters | 3000 rpmrequests per minute / 50 rpsrequests per second100 sessions / team |
Speech to TextBeta Transcribe audio files and live streams | $0.10 / hr(batch)$0.20 / hr(streaming) | 600 rpmrequests per minute / 10 rpsrequests per second100 sessions / team |
Production-ready infrastructure
SOC 2 Type II
Audited controls for security, availability, and confidentiality.
HIPAA Eligible
BAA available for healthcare applications handling protected health information.
GDPR Compliant
Data processing agreements and EU data residency options.
High Availability
Multi-region infrastructure for enterprise workloads.
Custom rate limits
Concurrent session and request limits scaled to your traffic.
SSO & access controls
SAML SSO, role-based access, and audit logging for your team.