Syncing lights with music: Marzullo's algorithm in the DJ booth

One of my favorite side projects is automatically syncing lights with music at house parties. I built a system to do this back in 2022-23, and it’s since been used at dozens of small events.

It’s a fun intersection of technology and non-technical topics, so I finally put aside time to write about it as a short series of posts. In this inaugural post I’ll talk about how we can keep in sync with a live DJ, even if they skip parts of a song or change tempos.

I wanted my lighting system to be fully automatic and follow the structure of a song and build up to a drop, like you see at concerts. Ideally I could hit play on Spotify, Djay Pro (auto-DJing), or Rekordbox (live DJing) and some colorful lights would start doing their thing. This requires knowing the song that’s currently playing and where we are in the song, so we can plan a light show for the entire song.1

I wanted to extract this info (the current track and seek position) from the music software directly, rather than attempting shenanigans with a microphone (consider remixes and repeated choruses)2. How do we do this?

Spotify is the easiest platform to integrate with because we can query its internal state using AppleScript3.

But what about Rekordbox and Djay Pro? A lot of reverse engineering work has been done on Rekordbox, including extracting playback state from DJ hardware network packets or process memory dumps on Windows. But these approaches have a huge number of limitations.

I settled on screen-recording Rekordbox and using OCR to extract info from its UI. This is terrible, yet it worked very well and was easiest to set up on the laptop used for DJing. I did a similar thing for Djay Pro, where I was able to use macOS accessibility tools instead of OCR.

For example, from the following screenshot we’d extract the “00:30.9” seek time, “124.0” playback speed, track title, and the artist4:

We need 10ms of precision for the seek time to make lights look in sync, yet UIs often truncate seek time to 0.1s or 1s. How do we work around this?

Because the current seek position is a moving target, we’ll think in terms of the track’s wall clock start time and its playback speed. We’ll also quantify uncertainty by thinking in terms of intervals.

One approach is to average now-truncatedSeek across a bunch of screenshots. If we record in a hot loop5 and avoid aliasing, this average will be close to the true song start time, offset by 0.05s or 0.5s depending on truncation. I used the averaging approach for a while and it worked well enough, but it made it hard to tell if latency bugs came from the audio stack, the lights, the song analysis, or the seek position estimate being wrong.

Fortunately there’s a more robust approach. In each screenshot, the truncated seek time implies an interval of precise seek times, in turn implying an interval of track start times. If we take a bunch of screenshots, we can look at where these intervals overlap, giving us a lot more precision:

This is essentially Marzullo’s algorithm, which was created for “estimating accurate times from a number of noisy time sources” and is used by NTP.

The algorithm lets us do some cool things:

We can intentionally time our screenshots to bisect our start time estimate interval, getting us to 1ms accuracy (we do this by screenshotting when we think the truncated time will tick, and checking if we over/under-shot)
OCR errors are a non-issue, because they’re out-voted by accurate reads
Playback speed changes are handled naturally if we look for the most popular (tempo, start time interval) intersection instead of only thinking about start time
We can write better heuristics to detect the DJ skipping part of a song. For example, we can quickly detect the DJ skipping part of a song by checking if recent screenshots’ intervals all agree with each other but all disagree with previous screenshots’ intervals

Migrating to this approach let me trust the seek position estimate and freed my mind to think about other things. That’s everything I could ask for in a playback state tracking implementation.

If people find this interesting, I’ll write more about this project in follow-up posts. Some possible topics:

How should we plan light effects? How about DJ transitions? Why use markov chains?
How can we use various hardware protocols to control lights? (IR remotes, DMX, ws2812b)
How can I abuse LED strips to get more than 256^3 colors?

Syncing lights with music: Marzullo's algorithm in the DJ booth

Discussion about this post

Ready for more?