Automatically align multi-track podcast recordings to a master track.
For the backstory and an interactive walkthrough of the algorithm, see the blog post.
The problem
Many podcasts are recorded as "double-enders" or "triple-enders" — each participant records their own audio locally while a live session captures everyone together. The result:
- A master track (the merged live recording with all voices)
- Individual tracks per participant (higher quality, isolated audio)
The individual tracks are better for editing (cleaner audio, per-person volume control, noise removal), but they don't start at the same time as the master. One host might join a few seconds late, recording devices have different start times, and clock drift means tracks slowly desync over the course of an hour.
PodSync fixes this. Give it the master track and the individual tracks, and it outputs new WAV files that are time-aligned to the master. Drop them all into your DAW at position 0:00 and they line up.
How it works
- Voice Activity Detection — Uses WebRTC VAD to find where speech actually occurs in each track, then selects up to 3 strong speech candidates (longer, contiguous regions preferred).
- Multi-candidate MFCC Cross-Correlation — Extracts Mel-frequency cepstral coefficients (spectral features used in speech recognition) from the master once, then compares each candidate window against it. Long candidates get sub-windows sampled from the start, middle, and end. The highest-confidence match wins; if a second independent region agrees on the same offset, that corroboration is used as a tie-breaker. MFCCs are robust to volume differences, EQ differences, and the fact that a single voice needs to match against a mixed master.
- Drift Measurement — Correlates again near the end of the recording and compares the offset to the start. The difference is drift caused by different clock rates between recording devices. PodSync reports the drift so you know if manual adjustment is needed.
- Output — Pads or trims each track to match the master's length and writes synced WAV files.
See ALGORITHM.md for the full technical deep dive.
Install
Requires Rust toolchain (rustup).
git clone https://github.com/kaushikgopal/podsync.git
cd podsync
makeThis builds a release binary at scripts/podsync. Add it to your PATH or invoke directly.
Usage
scripts/podsync \ --master /path/to/episode-master.mp3 \ --tracks /path/to/host1-clean.wav --tracks /path/to/host2-clean.wav
Options
| Flag | Required | Default | Description |
|---|---|---|---|
--master |
Yes | — | Master/sync reference track (mp3, wav, aiff, flac, ogg) |
--tracks |
Yes | — | Individual tracks to sync (repeat for multiple) |
--sync-window |
No | 120 |
Seconds of speech to use for cross-correlation |
--output-suffix |
No | synced |
Suffix appended to output filenames |
Output
Synced files are written to the same directory as the input tracks:
- Format:
{original_name}-{suffix}.wav - Sample rate: 44.1kHz, 24-bit WAV
- Length: matches master track exactly
A timestamped log file (podsync-<epoch>.log) is written next to the master file summarizing offsets, confidence, and drift for each track.
Example
$ podsync --master ep-master.mp3 --tracks ep-host1-clean.wav --tracks ep-host2-clean.wav
Loading master: ep-master.mp3
Duration: 58m32s at 44100Hz
Processing ep-host1-clean.wav...
Detecting speech regions... found 47m32s of speech
Correlating against master... offset: +1.23s (confidence: 0.94)
Measuring drift... 0.15s at master end
Processing ep-host2-clean.wav...
Detecting speech regions... found 38m15s of speech
Correlating against master... offset: +0.87s (confidence: 0.91)
Measuring drift... 0.08s at master end
============================================================
Summary:
ep-host1-clean-synced.wav offset: +1.23s drift: 0.15s ✓
ep-host2-clean-synced.wav offset: +0.87s drift: 0.08s ✓
============================================================
2 tracks synchronized successfully
Verifying results
After syncing, verify in your DAW:
- Import all
-synced.wavfiles - Place all at position 0:00
- Solo each track to confirm voices align
- Spot-check the middle and end for drift
AI Skill
This repo doubles as an AI coding agent skill. .agents/skills/podsync/SKILL.md contains orchestration instructions for AI agents — scanning episode folders for files, confirming selections with the user, invoking the CLI, and reporting results.
To use as a skill, symlink or copy .agents/skills/podsync/ into your project's skill directory.
Running tests
Or directly:
Dependencies
| Crate | Purpose |
|---|---|
| symphonia | Decode MP3, WAV, FLAC, OGG, AIFF |
| rubato | Resample to 44.1kHz |
| audioadapter-buffers | Buffer adapter for rubato's resampler API |
| hound | Write 24-bit WAV |
| webrtc-vad | Voice activity detection (Google WebRTC C lib) |
| realfft | FFT for MFCC extraction and cross-correlation |
| clap | CLI argument parsing |
See DEPENDENCIES.md for detailed rationale.
License
MIT