xAI — Creators of Grok, the AI Chatbot

1 min read Original article ↗

Voice Agent API

Multilingual voice agents over WebSocket with native tool calling, MCP support, and web search.

Text to Speech API

Choose from five distinct voices and multiple audio formats. Built for telephony and web.

Speech to Text API

Top-ranked in blind human evaluations across benchmarked languages. Handles accents and domain-specific terminology. Supports batch, streaming, and bidirectional modes.

Simple, transparent pricing

Clear, usage-based pricing for every voice API. No hidden fees.

See docs
APIPriceRate Limits

Voice Agent API

Real-time voice conversations over WebSocket

$0.05 / min($3.00 / hr)100 sessions / team

Text to SpeechBeta

Convert text to natural speech

$4.20 / 1M characters3000 rpmrequests per minute / 50 rpsrequests per second100 sessions / team

Speech to TextBeta

Transcribe audio files and live streams

$0.10 / hr(batch)$0.20 / hr(streaming)600 rpmrequests per minute / 10 rpsrequests per second100 sessions / team

[ Try it ]

Hear it for yourself

Call Grok and have a conversation.

844-448-4765

Production-ready infrastructure

SOC 2 Type II

Audited controls for security, availability, and confidentiality.

HIPAA Eligible

BAA available for healthcare applications handling protected health information.

GDPR Compliant

Data processing agreements and EU data residency options.

High Availability

Multi-region infrastructure for enterprise workloads.

Custom rate limits

Concurrent session and request limits scaled to your traffic.

SSO & access controls

SAML SSO, role-based access, and audit logging for your team.