Homer Simpson Singing Detector
Detects Homer Simpson singing in audio files using speaker embedding similarity and a trained gradient boosting classifier. Point it at your music library, and it will find (and optionally quarantine) any tracks where Homer is belting one out.
How it works
Audio File
|
v
[HDemucs Vocal Separation] ─── strips instrumentals, isolates vocals
|
v
[3-second Sliding Windows] ─── energy-gated, 50% overlap
|
v
[Resemblyzer Speaker Embeddings] ─── 256-D voice fingerprint per window
|
v
[GBM Classifier] ─── 13 features: 5 similarity + 8 acoustic (MFCCs, pitch, spectral)
|
v
[Detection Decision] ─── peak/mean probability thresholds
|
v
HOMER DETECTED / NOT DETECTED
Quick start
./setup_env.sh # 1. Install dependencies # Add Homer singing clips to reference_samples/ # 2. Gather audio (see below) python create_reference.py --negative-dir validation/not_homer/ # 3. Build reference python detect.py --dir ~/Music/ # 4. Scan your music
Prerequisites
- Python 3.12+
- ~400 MB disk for model weights (downloaded automatically on first run):
- Resemblyzer speaker encoder (~20 MB)
- HDemucs vocal separation model (~350 MB)
- Audio files of Homer Simpson singing (you supply these — see below)
Setup
# Clone and set up git clone <this-repo> cd homer-detector ./setup_env.sh # Or manually: python3 -m venv .venv source .venv/bin/activate pip install resemblyzer --no-deps # avoids broken webrtcvad C extension pip install -r requirements.txt
Step 1: Gather Homer singing samples
You need short clips (5-30 seconds each) of Homer Simpson singing. These are used to build a voice profile and train the classifier.
Where to find them: Extract clips from episodes of The Simpsons that you own. Look for scenes where Homer is clearly singing, ideally without too much background noise or other characters singing simultaneously.
Good scenes to look for:
| Episode | Scene | Why it's good |
|---|---|---|
| S5E1 "Homer's Barbershop Quartet" | "Baby on Board" performance | Clear solo vocal lines |
| S4E9 "Mr. Plow" | Mr. Plow jingle | Short, clear, iconic |
| The Simpsons Movie | "Spider-Pig" | Well-known, clean vocal |
| S7E19 "A Fish Called Selma" | "We Put the Spring in Springfield" | Extended singing |
| S11E6 "Alone Again, Natura-Diddily" | "In-A-Gadda-Da-Vida" church organ | Distinctive vocal style |
| S6E12 "Homer the Great" | "We Do (The Stonecutters' Song)" | Group number with Homer prominent |
| Various | Karaoke scenes, musical numbers | Any scene with Homer singing |
File format: Any common audio format works (.wav, .mp3, .flac, .ogg, .m4a, .aac). Save clips to reference_samples/.
Tips:
- Include a variety of Homer's singing styles (loud belting, quiet crooning, off-key)
- Clips don't need to be perfectly clean — the vocal separator handles background music
- More clips = better voice profile, but even 5 clips work
Step 2: Gather non-Homer training samples
The classifier needs negative examples to learn what Homer does not sound like. Save these to validation/not_homer/.
Types of negatives to include (in rough order of difficulty):
- Male baritone/crooner singers (hardest negatives — closest to Homer's vocal range): Rick Astley, Johnny Cash, Elvis Presley, Frank Sinatra, Leonard Cohen
- Male rock/pop singers: Beatles, Queen, Billy Joel, Elton John, David Bowie
- Other Simpsons characters (very hard): Bart singing, Mr. Burns, Krusty, ensemble numbers without Homer
- Female singers (easy negatives): Whitney Houston, Beyonce, Adele, Taylor Swift
- Rap/hip-hop: For spectral diversity
Aim for 5-30 second clips, same formats as above. Diversity matters more than quantity — a few clips from each category is better than many clips from one artist.
Step 3: Build the reference
# Basic (synthetic negatives only) python create_reference.py # With real negative samples (recommended — much better precision) python create_reference.py --negative-dir validation/not_homer/ # If your clips are already isolated vocals (no background music) python create_reference.py --skip-separation --negative-dir validation/not_homer/ # Limit negative processing for speed python create_reference.py --negative-dir validation/not_homer/ --max-negatives 30
This will:
- Separate vocals from each clip (using HDemucs)
- Extract speaker embeddings (using Resemblyzer)
- Train a gradient boosting classifier on similarity + acoustic features
- Save everything to
reference_embeddings/
Step 4: Scan your music
# Single file python detect.py song.mp3 # Entire directory (recursive) python detect.py --dir ~/Music/playlist/ # See per-window similarity breakdown python detect.py song.mp3 --timestamps # Quarantine mode — move detected files to HOMERS/ subfolder python detect.py --dir ~/Music/ --quarantine # Preview what would be moved (no files touched) python detect.py --dir ~/Music/ --quarantine --dry-run # Skip vocal separation (faster, less accurate) python detect.py --dir ~/Music/ --no-separation # Custom detection threshold python detect.py --dir ~/Music/ --threshold 0.8
Sample count guidance
| Category | Minimum | Recommended | Why |
|---|---|---|---|
| Homer reference clips | 5 | 8-15 | Voice profile + classifier training positives |
| Non-Homer training clips | 20 | 50+ | Classifier precision (0 false positives at 92 negatives in testing) |
| Homer validation (optional) | 5 | 10+ | Check recall — are we finding Homer? |
| Non-Homer validation (optional) | 20 | 50+ | Check false positive rate |
Optional: Validation
If you set up validation directories, you can measure accuracy:
validation/
homer/ <- Homer singing clips (ground truth positive)
not_homer/ <- Non-Homer clips (ground truth negative)
Outputs a confusion matrix, accuracy, precision, recall, F1 score, and lists any misclassifications. Exit code 0 if accuracy >= 90%.
CLI reference
detect.py
Usage: detect.py [OPTIONS] [FILES]...
Options:
--dir PATH Process all audio files in directory (recursive)
--timestamps Show per-window similarity breakdown
--threshold FLOAT Detection threshold (auto-selected based on scoring mode)
--no-separation Skip Demucs vocal separation (faster, less accurate)
--quarantine Move detected files into a HOMERS/ subfolder
--log PATH Write a detection log to this file
--dry-run Show what would be moved without actually moving files
--help Show this message and exit
create_reference.py
Usage: create_reference.py [OPTIONS]
Options:
--samples-dir PATH Directory containing Homer singing clips (default: reference_samples)
--output-dir PATH Directory for output embeddings (default: reference_embeddings)
--skip-separation Skip Demucs vocal separation
--negative-dir PATH Directory containing non-Homer audio for classifier training
--max-negatives INT Max number of negative audio files to process (0 = all)
--help Show this message and exit
Confidence levels
When using the classifier (default after training), detection uses probability thresholds:
| Level | Probability | Meaning |
|---|---|---|
| HIGH | >= 0.90 | Very confident Homer detection |
| DETECTED | >= 0.72 | Homer detected |
| POSSIBLE | >= 0.50 | Possible match, review manually |
| NO | < 0.50 | Not Homer |
A file is flagged as "Homer detected" when:
- Peak window probability >= 0.72, and
- Mean window probability >= 0.50 (filters isolated spikes from similar-sounding singers)
- Exception: peaks >= 0.90 bypass the mean requirement
Performance
- Vocal separation: ~20-40 seconds per song (HDemucs on CPU)
- Embedding + classification: ~1-2 seconds per song
- With
--no-separation: ~1-2 seconds total (less accurate for mixed audio) - Reference building: ~2-5 minutes depending on sample count and negative corpus size
Limitations
- Requires training data you provide. This repo contains no copyrighted audio — you need your own Simpsons clips and music samples.
- Cartoon voice overlap. Other Simpsons characters (especially in ensemble numbers) can occasionally trigger false positives. Adding these as negative training samples helps.
- Short clips. Detection works best on clips >= 5 seconds. Very short Homer appearances (1-2 seconds) may be missed.
- Talking vs singing. The reference is trained on Homer singing. Homer speaking won't reliably trigger detection, though some overlap exists.
- CPU only. Currently runs on CPU. GPU support for HDemucs would significantly speed up vocal separation.
Project structure
homer-detector/
├── detect.py # Main detection script — scan files, quarantine mode
├── create_reference.py # Build reference embeddings + train classifier
├── validate.py # Measure accuracy on labeled validation set
├── setup_env.sh # One-command environment setup
├── requirements.txt # Python dependencies
├── src/
│ ├── __init__.py
│ ├── _webrtcvad_shim.py # Stub to bypass broken webrtcvad dependency
│ ├── audio_utils.py # Audio loading, normalization, windowing
│ ├── classifier.py # GBM classifier training and inference
│ ├── config.py # All tunable constants and thresholds
│ ├── detector.py # Detection logic and result aggregation
│ ├── embedding_engine.py# Resemblyzer wrapper for speaker embeddings
│ ├── features.py # Acoustic feature extraction (MFCCs, pitch)
│ └── vocal_separator.py # HDemucs vocal isolation
├── tests/ # Unit tests (72 tests, no audio files needed)
│ ├── test_audio_utils.py
│ ├── test_classifier.py
│ ├── test_config.py
│ ├── test_detector.py
│ ├── test_features.py
│ └── test_webrtcvad_shim.py
├── reference_samples/ # Your Homer singing clips go here
├── reference_embeddings/ # Generated by create_reference.py (gitignored)
└── validation/ # Optional labeled clips for accuracy testing
├── homer/
└── not_homer/