GitHub - podcast-lm/the-ai-podcast: An open-source AI podcast creator

2 min read Original article ↗

the-ai-podcast

An open-source AI podcast creator

To use this script, you'll need your own Google Gemini/Anthropic/OpenAI and ElevenLabs API keys. To create a podcast, follow these steps:

pip install pydub anthropic google-generativeai openai
export GOOGLE_API_KEY=<your-google-api-key>
export ELEVENLABS_API_KEY=<your-elevenlabs-api-key>
python podcast.py \
    --input the-ai-podcast/episode-01-audio-lm/audio-lm.txt \
    --output-dir the-ai-podcast/episode-01-audio-lm \
    --model-name gemini-1.5-flash-002 \
    --llm-provider google \
    --save-traces \
    --voice Callum

You can find an example of a generated podcast at the-ai-podcast/episode-01-audio-lm/audio.mp3.

This diagram illustrates the podcast creation process:

graph TD
    subgraph "Content Analysis"
    C[Extract Metadata & Summary] --> D[Generate & Answer Questions]
    end

    subgraph "Script Creation"
        D --> E[Create & Improve Monologue]
        E --> G[Get & Apply Feedback]
    end

    subgraph "Output Generation"
        G --> I[Generate Audio]
    end

Loading

Limitations

Currently, only one source document is supported for podcast creation. If you want to create a podcast from multiple sources, combine them into a single document.

Responsible Use

Be mindful of the generated content, as you are responsible for what it communicates.
Inform your audience that the podcast is generated by AI.

Changelog

2024-11-09

Added support for OpenAI

2024-11-07

Added options to generate only the script or only the audio
Incorporated three rounds of LLM feedback to improve the script
Cleaned up and improved prompts

2024-11-04

Added Gemini 1.5 Flash support (faster and cheaper)
Improved quality through listener feedback
Used an LLM call to clean up the final script before voice generation
Unified the LLM interface for Anthropic and Google Gemini
Separated the host background and show format into dedicated files

2024-11-03

Added support for Google Gemini as an LLM provider
Added LLM trace saving for debugging and analysis
Added host personality to make podcasts more engaging and natural