GitHub - skorotkiewicz/vibevoice-podcast: Create multi-voice podcasts with AI text-to-speech

🎙️ Podcast Maker

Create multi-voice podcasts with AI text-to-speech. Add segments, assign different voices to each, preview in real-time, and export as audio.

Features

Multi-segment editing — Add unlimited text segments
Per-segment voices — Assign different voices to each segment
Audio pre-generation — Generate audio clips before playback for consistency
Audio caching — Generated clips are cached for instant replay
Real-time preview — Play individual segments or the entire podcast
Audio export — Download your podcast as a single WAV file
Project management — Export/Import podcasts as JSON

Demo

example/podcast-1765734509668.wav

Prerequisites

1. Clone VibeVoice

git clone https://github.com/microsoft/VibeVoice
cd VibeVoice

2. Start the API Server

Copy the server script to your VibeVoice directory:

cp example/server.py /path/to/VibeVoice/demo/server.py

Run the server:

cd /path/to/VibeVoice
python demo/server.py --model microsoft/VibeVoice-Realtime-0.5B --device cuda --port 8880

The server will start at http://localhost:8880.

Running the Frontend

# Install dependencies
bun install   # or npm install

# Start development server
bun dev       # or npm run dev

Open http://localhost:5173 in your browser.

Usage

Creating a Podcast

Add segments — Click "Add Segment" to create new text entries
Write content — Enter text for each segment
Select voices — Choose a voice from the dropdown for each segment
Generate audio — Click "Generate" on each segment to create the audio clip
Preview — Click "Play" on individual segments or "Play Podcast" for all

Generating Audio

Each segment has a Generate / Regenerate button:

Generate — Creates the audio clip for that segment (amber button)
Regenerate — Re-creates the audio if you want a different take
Generate All — Generates all missing audio clips at once
Ready — Green indicator shows the segment has cached audio

Tip: Pre-generating audio ensures consistent playback. Each time you generate, the audio is cached and will play the same way every time — no more "I feel lucky" randomness!

Playback

If a segment has cached audio, it plays instantly
If not cached, it will generate first, then play
The cache is invalidated when you change text or voice

Exporting

JSON — Save your project with "Export JSON" to continue editing later
Audio — Click "Download Audio" to export the complete podcast
- Uses cached audio where available (fast!)
- Generates missing segments automatically

Importing

Click "Import" and select a previously saved JSON file to restore your project.

Example Podcasts

Check the example/ folder for sample podcast JSON files you can import.

Configuration

The frontend connects to the API at http://localhost:8880/api. To change this, edit the API_BASE constant in src/App.tsx.

Tech Stack

Frontend: React + TypeScript + Vite + Tailwind CSS
Backend: FastAPI + VibeVoice TTS
Audio: REST API synthesis with client-side caching

License

MIT