Custom Voices and Voice Library

2 min read Original article ↗

Today, we're introducing Custom Voices. Clone your voice from a few seconds of audio and use it instantly across Grok Text to Speech and Voice Agent APIs.

Alongside Custom Voices, the new Voice Library gives your team a single place to browse, preview, and manage all your voices from the xAI console.

Use Cases

Custom Voices unlock a new class of applications.

Custom Voices

Clone your voice in under two minutes. Use it everywhere.

Record about a minute of natural speech in the xAI console. Our pipeline verifies you're the voice owner, processes your recording, and delivers a production-ready voice model, all in under two minutes. Your custom voice inherits every TTS capability: speech tags, multilingual output, and both REST and WebSocket streaming.

Custom voices work everywhere our built-in voices do. Pass the voice_id to any TTS endpoint or use it with the Voice Agent API for real-time conversational agents.

Voice Safety

Every custom voice goes through a two-stage verification process before it can be created. First, the speaker reads a verification phrase that our STT engine transcribes and matches in real time, confirming intent and presence. Then we compute speaker embeddings from the verification clip and the full recording to confirm they belong to the same person.

You can't clone a voice from a pre-existing recording, and you can't clone someone else's voice.

Passphrase Check

Read a verification phrase aloud. Our STT engine transcribes and matches it in real time, verifying your consent and presence.

Speaker Similarity

Speaker embeddings from the passphrase and the full recording are compared to confirm they belong to the same person.

Voice Library

The Voice Library is a new section in the xAI console that organizes every voice available to your team, with your custom creations alongside our built-in voices. Browse, preview, and manage voices from a single page.

We've expanded our built-in voice catalog to over 80 voices across 28 languages. Listen to any voice across different scenarios before choosing one for your application.

There is no extra charge to use Text to Speech or Voice Agent APIs with custom voices.