Introducing the Realtime API

Update on August 28, 2025: We announced the general availability of the Realtime API. Learn more here.

Update on February 3, 2025: We no longer limit the number of simultaneous sessions on the Realtime API. Please refer to our docs⁠(opens in a new window) for the latest rate limits on the Realtime API.

Update on October 30, 2024: We've added five new voices with greater range and expressiveness. Cached pricing is now also available for text and audio inputs, lowering the price to $2.50/1M cached text input tokens and $20/1M cached audio input tokens. Learn more here⁠(opens in a new window).

Update on October 17, 2024: Audio inputs and outputs are now available in the Chat Completions API. Get started here⁠(opens in a new window).

Today, we're introducing a public beta of the Realtime API, enabling all paid developers to build low-latency, multimodal experiences in their apps. Similar to ChatGPT’s Advanced Voice Mode, the Realtime API supports natural speech-to-speech conversations using the six preset voices⁠(opens in a new window) already supported in the API.

We’re also introducing audio input and output in the Chat Completions API⁠(opens in a new window) to support use cases that don’t require the low-latency benefits of the Realtime API. With this update, developers can pass any text or audio inputs into GPT‑4o⁠ and have the model respond with their choice of text, audio, or both.

From language apps and educational software to customer support experiences, developers have already been leveraging voice experiences to connect with their users. Now with the Realtime API and soon with audio in the Chat Completions API, developers no longer have to stitch together multiple models to power these experiences. Instead, you can build natural conversational experiences with a single API call.