Build low-latency, multimodal LLM applications with the Realtime API. The OpenAI Realtime API enables low-latency communication with models that natively support speech-to-speech interactions as well as multimodal inputs (audio, images, and text) and outputs (audio and text). These APIs can also be used for realtime audio transcription. One of the most common use cases for the Realtime API is building voice agents for speech-to-speech model interactions in the browser. Our recommended starting point for these types of applications is the Agents SDK for TypeScript, which uses a WebRTC connection to the Realtime model in the browser, and WebSocket when used on the server. Follow the voice agent quickstart to build Realtime agents in the browser. To use the Realtime API directly outside the context of voice agents, check out the other connection options below. While building voice agents with the Agents SDK is the fastest path to one specific type of application, the Realtime API provides an entire suite of flexible tools for a variety of use cases. There are three primary supported interfaces for the Realtime API: Ideal for browser and client-side interactions with a Realtime model. Ideal for middle tier server-side applications with consistent low-latency network
connections. Ideal for VoIP telephony connections. Depending on how you'd like to connect to a Realtime model, check out one of the connection guides above to get started. You'll learn how to initialize a Realtime session, and how to interact with a Realtime model using client and server events. Once connected to a realtime model using one of the methods above, learn how to interact with the model in these usage guides. There are a few key differences between the interfaces in the Realtime beta API and the recently released GA API. Expand the topics below for more information about migrating from the beta interface to GA. For REST API requests, WebSocket connections, and other interfaces with the Realtime API, beta users had to include the following header with each request: This header should be removed for requests to the GA interface. To retain the behavior of the beta API, you should continue to include this header. Generating ephemeral API keys In the beta interface, there were multiple endpoints for generating ephemeral keys for either Realtime sessions or transcription sessions. In the GA interface, there is only one REST API endpoint used to generate keys - To create a session and receive a client secret you can use to initialize a WebRTC or WebSocket connection on a client, you can request one like this using the appropriate session configuration: These tokens can safely be used in client environments like browsers and mobile applications. New URL for WebRTC SDP data When initializing a WebRTC session in the browser, the URL for obtaining remote session information via SDP is now New event names and shapes When creating or updating a Realtime session in the GA interface, you must now specify a session type, since now the same client event is used to create both speech-to-speech and transcription sessions. The options for the session type are: Configuration for input modalities and other properties have moved as well,
notably output audio configuration like model voice. Check the API reference for the latest event shapes. Finally, some event names have changed to reflect their new position in the event data model: New conversation item events For We have added a Current event shape for conversation items added: New events to replace the above: Input and output item changes Realtime API sets an Assistant Message Content The Voice agents
1
2
3
4
5
6
7
8
9
10
11
12
13
import { RealtimeAgent, RealtimeSession } from "@openai/agents/realtime";
const agent = new RealtimeAgent({
name: "Assistant",
instructions: "You are a helpful assistant.",
});
const session = new RealtimeSession(agent);
// Automatically connects your microphone and audio output
await session.connect({
apiKey: "<client-api-key>",
});Connection methods
API Usage
Beta to GA migration
POST /v1/realtime/client_secrets.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
const sessionConfig = JSON.stringify({
session: {
type: "realtime",
model: "gpt-realtime",
audio: {
output: { voice: "marin" },
},
},
});
const response = await fetch("https://api.openai.com/v1/realtime/client_secrets", {
method: "POST",
headers: {
Authorization: `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
body: sessionConfig,
});
const data = await response.json();
console.log(data.value); // e.g. ek_68af296e8e408191a1120ab6383263c2/v1/realtime/calls:1
2
3
4
5
6
7
8
9
10
11
12
13
14
const baseUrl = "https://api.openai.com/v1/realtime/calls";
const model = "gpt-realtime";
const sdpResponse = await fetch(baseUrl, {
method: "POST",
body: offer.sdp,
headers: {
Authorization: `Bearer YOUR_EPHEMERAL_KEY_HERE`,
"Content-Type": "application/sdp",
},
});
const sdp = await sdpResponse.text();
const answer = { type: "answer", sdp };
await pc.setRemoteDescription(answer);
realtime for speech-to-speechtranscription for realtime audio transcription1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import WebSocket from "ws";
const url = "wss://api.openai.com/v1/realtime?model=gpt-realtime";
const ws = new WebSocket(url, {
headers: {
Authorization: "Bearer " + process.env.OPENAI_API_KEY,
},
});
ws.on("open", function open() {
console.log("Connected to server.");
// Send client events over the WebSocket once connected
ws.send(
JSON.stringify({
type: "session.update",
session: {
type: "realtime",
instructions: "Be extra nice today!",
},
})
);
});1
2
3
4
5
6
7
8
9
10
11
12
13
14
ws.on("open", function open() {
ws.send(
JSON.stringify({
type: "session.update",
session: {
type: "realtime",
model: "gpt-realtime",
audio: {
output: { voice: "marin" },
},
},
})
);
});
response.text.delta → response.output_text.deltaresponse.audio.delta → response.output_audio.deltaresponse.audio_transcript.delta → response.output_audio_transcript.deltaresponse.output_item, the API has always had both .added and .done events, but for conversation level items the API previously only had .created, which by convention is emitted at the start when the item added..added and .done event to allow better ergonomics for developers when receiving events that need some loading time (such as MCP tool listing or input audio transcriptions if these were to be modeled as items in the future).1
2
3
4
5
6
{
"event_id": "event_1920",
"type": "conversation.item.created",
"previous_item_id": "msg_002",
"item": Item
}1
2
3
4
5
6
{
"event_id": "event_1920",
"type": "conversation.item.added",
"previous_item_id": "msg_002",
"item": Item
}1
2
3
4
5
6
{
"event_id": "event_1920",
"type": "conversation.item.done",
"previous_item_id": "msg_002",
"item": Item
}All Items
object=realtime.item param on all items in the GA interface.Function Call Output
status : Realtime now accepts a no-op status field for the function call output item param. This aligns with the Responses API implementation.Message
type properties of output assistant messages now align with the Responses API:
type=text → type=output_text (no change to text field name)type=audio → type=output_audio (no change to audio field name)