GitHub - sliday/ytsum: TLDW: Awesome Video Summarizer

Awesome YouTube Video Summary/Podcast/Video

A Python script to generate summaries (Claude), podcasts (Whisper), and videos (RunwayML or Luma AI) from annoyingly long YouTube content.

Example

Original video: https://www.youtube.com/watch?v=_K-L9uhsBLM
Summary: https://dl.dropbox.com/scl/fi/mdkbglfbs4m9ydeo9a2k7/video-_K-L9uhsBLM.mp4?rlkey=3wrowryg9gio1walaxhdbp2is&dl=0

Features

Generate concise summaries of YouTube videos
Create engaging podcast scripts with multiple voices
Generate AI-powered videos with synchronized podcast audio
Support for multiple languages
Multiple transcription options
Multiple video generation providers

Installation

Clone the repository:

git clone https://github.com/sliday/ytsum.git
cd ytsum

Install dependencies:

pip install -r requirements.txt

Install FFmpeg (required for audio/video processing):
- macOS: brew install ffmpeg
- Ubuntu/Debian: sudo apt-get install ffmpeg
- Windows: Download from FFmpeg website

Environment Setup

Create a .env file with your API keys:

ANTHROPIC_API_KEY=your_claude_api_key
OPENAI_API_KEY=your_openai_api_key
LUMAAI_API_KEY=your_lumaai_api_key
RUNWAYML_API_SECRET=your_runwayml_api_key
REPLICATE_API_TOKEN=your_replicate_api_key

Usage

Basic Summary

python ytsum.py "https://www.youtube.com/watch?v=VIDEO_ID"

Generate Podcast

python ytsum.py --podcast "https://www.youtube.com/watch?v=VIDEO_ID"

Generate Video with Podcast

# Using Luma AI (faster, recommended)
python ytsum.py --podcast --lumaai "https://www.youtube.com/watch?v=VIDEO_ID"

# Using RunwayML
python ytsum.py --podcast --runwayml "https://www.youtube.com/watch?v=VIDEO_ID"

Additional Options

--language: Specify output language (default: english)
--ignore-subs: Force transcription even when subtitles exist
--fast-whisper: Use Fast Whisper for transcription (faster)
--whisper: Use OpenAI Whisper for transcription (more accurate)
--replicate: Use Replicate's Incredibly Fast Whisper

Output Files

All output files are saved in the out directory:

summary-{video_id}.txt: Text summary
podcast-{video_id}.txt: Podcast script
podcast-{video_id}.mp3: Podcast audio
video-{video_id}.mp4: Final video with podcast audio

Video Generation

The tool supports two AI video generation providers:

Luma AI (Recommended)

Faster generation times
High-quality cinematic videos
Supports camera movements and scene transitions
Maintains visual consistency
Optional image input for style reference

RunwayML

High-quality video generation
Requires input image
Longer processing times
Professional-grade output

Both providers:

Generate base images using Flux AI
Create video segments based on podcast content
Combine segments with audio
Support custom duration and aspect ratio

Transcription Options

Fast Whisper (Default)
- Quick transcription
- Good accuracy
- No API key required
OpenAI Whisper
- High accuracy
- Slower processing
- Requires OpenAI API key
Replicate Whisper
- Fastest option
- Good accuracy
- Requires Replicate API key

Testing

Run the test suite:

Run specific test groups:

# Run Luma AI tests only
pytest -v -m luma

# Run RunwayML tests only
pytest -v -m runway

Dependencies

anthropic: Claude API for text generation
openai: Whisper API for transcription and TTS
lumaai: Luma AI for video generation (recommended)
runwayml: RunwayML for video generation
replicate: Flux AI for image generation
ffmpeg-python: Audio/video processing
colorama: Terminal output formatting
pytest: Testing framework

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.