GitHub - nohomersclub/NoHomersClub: No Homers Club

7 min read Original article ↗

Homer Simpson Singing Detector

Detects Homer Simpson singing in audio files using speaker embedding similarity and a trained gradient boosting classifier. Point it at your music library, and it will find (and optionally quarantine) any tracks where Homer is belting one out.

How it works

Audio File
    |
    v
[HDemucs Vocal Separation] ─── strips instrumentals, isolates vocals
    |
    v
[3-second Sliding Windows] ─── energy-gated, 50% overlap
    |
    v
[Resemblyzer Speaker Embeddings] ─── 256-D voice fingerprint per window
    |
    v
[GBM Classifier] ─── 13 features: 5 similarity + 8 acoustic (MFCCs, pitch, spectral)
    |
    v
[Detection Decision] ─── peak/mean probability thresholds
    |
    v
HOMER DETECTED / NOT DETECTED

Quick start

./setup_env.sh                                          # 1. Install dependencies
# Add Homer singing clips to reference_samples/         # 2. Gather audio (see below)
python create_reference.py --negative-dir validation/not_homer/  # 3. Build reference
python detect.py --dir ~/Music/                         # 4. Scan your music

Prerequisites

  • Python 3.12+
  • ~400 MB disk for model weights (downloaded automatically on first run):
    • Resemblyzer speaker encoder (~20 MB)
    • HDemucs vocal separation model (~350 MB)
  • Audio files of Homer Simpson singing (you supply these — see below)

Setup

# Clone and set up
git clone <this-repo>
cd homer-detector
./setup_env.sh

# Or manually:
python3 -m venv .venv
source .venv/bin/activate
pip install resemblyzer --no-deps   # avoids broken webrtcvad C extension
pip install -r requirements.txt

Step 1: Gather Homer singing samples

You need short clips (5-30 seconds each) of Homer Simpson singing. These are used to build a voice profile and train the classifier.

Where to find them: Extract clips from episodes of The Simpsons that you own. Look for scenes where Homer is clearly singing, ideally without too much background noise or other characters singing simultaneously.

Good scenes to look for:

Episode Scene Why it's good
S5E1 "Homer's Barbershop Quartet" "Baby on Board" performance Clear solo vocal lines
S4E9 "Mr. Plow" Mr. Plow jingle Short, clear, iconic
The Simpsons Movie "Spider-Pig" Well-known, clean vocal
S7E19 "A Fish Called Selma" "We Put the Spring in Springfield" Extended singing
S11E6 "Alone Again, Natura-Diddily" "In-A-Gadda-Da-Vida" church organ Distinctive vocal style
S6E12 "Homer the Great" "We Do (The Stonecutters' Song)" Group number with Homer prominent
Various Karaoke scenes, musical numbers Any scene with Homer singing

File format: Any common audio format works (.wav, .mp3, .flac, .ogg, .m4a, .aac). Save clips to reference_samples/.

Tips:

  • Include a variety of Homer's singing styles (loud belting, quiet crooning, off-key)
  • Clips don't need to be perfectly clean — the vocal separator handles background music
  • More clips = better voice profile, but even 5 clips work

Step 2: Gather non-Homer training samples

The classifier needs negative examples to learn what Homer does not sound like. Save these to validation/not_homer/.

Types of negatives to include (in rough order of difficulty):

  • Male baritone/crooner singers (hardest negatives — closest to Homer's vocal range): Rick Astley, Johnny Cash, Elvis Presley, Frank Sinatra, Leonard Cohen
  • Male rock/pop singers: Beatles, Queen, Billy Joel, Elton John, David Bowie
  • Other Simpsons characters (very hard): Bart singing, Mr. Burns, Krusty, ensemble numbers without Homer
  • Female singers (easy negatives): Whitney Houston, Beyonce, Adele, Taylor Swift
  • Rap/hip-hop: For spectral diversity

Aim for 5-30 second clips, same formats as above. Diversity matters more than quantity — a few clips from each category is better than many clips from one artist.

Step 3: Build the reference

# Basic (synthetic negatives only)
python create_reference.py

# With real negative samples (recommended — much better precision)
python create_reference.py --negative-dir validation/not_homer/

# If your clips are already isolated vocals (no background music)
python create_reference.py --skip-separation --negative-dir validation/not_homer/

# Limit negative processing for speed
python create_reference.py --negative-dir validation/not_homer/ --max-negatives 30

This will:

  1. Separate vocals from each clip (using HDemucs)
  2. Extract speaker embeddings (using Resemblyzer)
  3. Train a gradient boosting classifier on similarity + acoustic features
  4. Save everything to reference_embeddings/

Step 4: Scan your music

# Single file
python detect.py song.mp3

# Entire directory (recursive)
python detect.py --dir ~/Music/playlist/

# See per-window similarity breakdown
python detect.py song.mp3 --timestamps

# Quarantine mode — move detected files to HOMERS/ subfolder
python detect.py --dir ~/Music/ --quarantine

# Preview what would be moved (no files touched)
python detect.py --dir ~/Music/ --quarantine --dry-run

# Skip vocal separation (faster, less accurate)
python detect.py --dir ~/Music/ --no-separation

# Custom detection threshold
python detect.py --dir ~/Music/ --threshold 0.8

Sample count guidance

Category Minimum Recommended Why
Homer reference clips 5 8-15 Voice profile + classifier training positives
Non-Homer training clips 20 50+ Classifier precision (0 false positives at 92 negatives in testing)
Homer validation (optional) 5 10+ Check recall — are we finding Homer?
Non-Homer validation (optional) 20 50+ Check false positive rate

Optional: Validation

If you set up validation directories, you can measure accuracy:

validation/
  homer/        <- Homer singing clips (ground truth positive)
  not_homer/    <- Non-Homer clips (ground truth negative)

Outputs a confusion matrix, accuracy, precision, recall, F1 score, and lists any misclassifications. Exit code 0 if accuracy >= 90%.

CLI reference

detect.py

Usage: detect.py [OPTIONS] [FILES]...

Options:
  --dir PATH          Process all audio files in directory (recursive)
  --timestamps        Show per-window similarity breakdown
  --threshold FLOAT   Detection threshold (auto-selected based on scoring mode)
  --no-separation     Skip Demucs vocal separation (faster, less accurate)
  --quarantine        Move detected files into a HOMERS/ subfolder
  --log PATH          Write a detection log to this file
  --dry-run           Show what would be moved without actually moving files
  --help              Show this message and exit

create_reference.py

Usage: create_reference.py [OPTIONS]

Options:
  --samples-dir PATH   Directory containing Homer singing clips (default: reference_samples)
  --output-dir PATH    Directory for output embeddings (default: reference_embeddings)
  --skip-separation    Skip Demucs vocal separation
  --negative-dir PATH  Directory containing non-Homer audio for classifier training
  --max-negatives INT  Max number of negative audio files to process (0 = all)
  --help               Show this message and exit

Confidence levels

When using the classifier (default after training), detection uses probability thresholds:

Level Probability Meaning
HIGH >= 0.90 Very confident Homer detection
DETECTED >= 0.72 Homer detected
POSSIBLE >= 0.50 Possible match, review manually
NO < 0.50 Not Homer

A file is flagged as "Homer detected" when:

  • Peak window probability >= 0.72, and
  • Mean window probability >= 0.50 (filters isolated spikes from similar-sounding singers)
  • Exception: peaks >= 0.90 bypass the mean requirement

Performance

  • Vocal separation: ~20-40 seconds per song (HDemucs on CPU)
  • Embedding + classification: ~1-2 seconds per song
  • With --no-separation: ~1-2 seconds total (less accurate for mixed audio)
  • Reference building: ~2-5 minutes depending on sample count and negative corpus size

Limitations

  • Requires training data you provide. This repo contains no copyrighted audio — you need your own Simpsons clips and music samples.
  • Cartoon voice overlap. Other Simpsons characters (especially in ensemble numbers) can occasionally trigger false positives. Adding these as negative training samples helps.
  • Short clips. Detection works best on clips >= 5 seconds. Very short Homer appearances (1-2 seconds) may be missed.
  • Talking vs singing. The reference is trained on Homer singing. Homer speaking won't reliably trigger detection, though some overlap exists.
  • CPU only. Currently runs on CPU. GPU support for HDemucs would significantly speed up vocal separation.

Project structure

homer-detector/
├── detect.py              # Main detection script — scan files, quarantine mode
├── create_reference.py    # Build reference embeddings + train classifier
├── validate.py            # Measure accuracy on labeled validation set
├── setup_env.sh           # One-command environment setup
├── requirements.txt       # Python dependencies
├── src/
│   ├── __init__.py
│   ├── _webrtcvad_shim.py # Stub to bypass broken webrtcvad dependency
│   ├── audio_utils.py     # Audio loading, normalization, windowing
│   ├── classifier.py      # GBM classifier training and inference
│   ├── config.py          # All tunable constants and thresholds
│   ├── detector.py        # Detection logic and result aggregation
│   ├── embedding_engine.py# Resemblyzer wrapper for speaker embeddings
│   ├── features.py        # Acoustic feature extraction (MFCCs, pitch)
│   └── vocal_separator.py # HDemucs vocal isolation
├── tests/                 # Unit tests (72 tests, no audio files needed)
│   ├── test_audio_utils.py
│   ├── test_classifier.py
│   ├── test_config.py
│   ├── test_detector.py
│   ├── test_features.py
│   └── test_webrtcvad_shim.py
├── reference_samples/     # Your Homer singing clips go here
├── reference_embeddings/  # Generated by create_reference.py (gitignored)
└── validation/            # Optional labeled clips for accuracy testing
    ├── homer/
    └── not_homer/