GitHub - openai/openai-realtime-twilio-demo

OpenAI Realtime API with Twilio Quickstart

Combine OpenAI's Realtime API and Twilio's phone calling capability to build an AI calling assistant.

Quick Setup

Open three terminal windows:

Terminal	Purpose	Quick Reference (see below for more)
1	To run the `webapp`	`npm run dev`
2	To run the `websocket-server`	`npm run dev`
3	To run `ngrok`	`ngrok http 8081`

Make sure all vars in webapp/.env and websocket-server/.env are set correctly. See full setup section for more.

Overview

This repo implements a phone calling assistant with the Realtime API and Twilio, and had two main parts: the webapp, and the websocket-server.

webapp: NextJS app to serve as a frontend for call configuration and transcripts
websocket-server: Express backend that handles connection from Twilio, connects it to the Realtime API, and forwards messages to the frontend

Twilio uses TwiML (a form of XML) to specify how to handle a phone call. When a call comes in we tell Twilio to start a bi-directional stream to our backend, where we forward messages between the call and the Realtime API. ({{WS_URL}} is replaced with our websocket endpoint.)

<!-- TwiML to start a bi-directional stream-->

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Say>Connected</Say>
  <Connect>
    <Stream url="{{WS_URL}}" />
  </Connect>
  <Say>Disconnected</Say>
</Response>

We use ngrok to make our server reachable by Twilio.

Life of a phone call

Setup

We run ngrok to make our server reachable by Twilio
We set the Twilio webhook to our ngrok address
Frontend connects to the backend (wss://[your_backend]/logs), ready for a call

Call

Call is placed to Twilio-managed number
Twilio queries the webhook (http://[your_backend]/twiml) for TwiML instructions
Twilio opens a bi-directional stream to the backend (wss://[your_backend]/call)
The backend connects to the Realtime API, and starts forwarding messages:
- between Twilio and the Realtime API
- between the frontend and the Realtime API

Function Calling

This demo mocks out function calls so you can provide sample responses. In reality you could handle the function call, execute some code, and then supply the response back to the model.

Full Setup

Make sure your auth & env is configured correctly.
Run webapp.

cd webapp
npm install
npm run dev

Run websocket server.

cd websocket-server
npm install
npm run dev

Detailed Auth & Env

OpenAI & Twilio

Set your credentials in webapp/.env and websocket-server - see webapp/.env.example and websocket-server.env.example for reference.

Ngrok

Twilio needs to be able to reach your websocket server. If you're running it locally, your ports are inaccessible by default. ngrok can make them temporarily accessible.

We have set the websocket-server to run on port 8081 by default, so that is the port we will be forwarding.

Make note of the Forwarding URL. (e.g. https://54c5-35-170-32-42.ngrok-free.app)

Websocket URL

Your server should now be accessible at the Forwarding URL when run, so set the PUBLIC_URL in websocket-server/.env. See websocket-server/.env.example for reference.

Additional Notes

This repo isn't polished, and the security practices leave some to be desired. Please only use this as reference, and make sure to audit your app with security and engineering before deploying!