Show HN: Chipmunkify – I used ML to solve audio's dumbest problem
chipmunkify.comHi HN,
You know those YouTube channels with millions of views that are just popular songs, but chipmunk? They sound awful because they pitch-shift the entire track.
I fixed it.
The pipeline: Upload an MP3 -> Demucs isolates the vocals -> Rubber Band pitch-shifts them into the rodent register -> FFmpeg glues it back together. Hosted on Modal.
Guardrails protecting my $5/month Modal budget and net worth:
* 10MB max upload
* First 30 seconds only (Demucs is computationally brutal)
* 3 requests/IP/day
I delete your files immediately out of principle, but also because I genuinely cannot afford to store them.
If the site throws an error, you've assassinated my budget and I will pour one out.
Try it out and let me know what you think. I suggest using a song that you love!
love the budget-constraints section, feels way too familiar. i'm running 7 NLP models on an $11/month VPS for a different project and every architecture decision ends up being "what's the cheapest way that doesn't burn my bank account on one runaway request."
question — how are you handling demucs cold starts on modal? cold start was what eventually pushed me off modal for a request-response
use case. the user is staring at a spinner for 20-30s on first invocation and it kills conversion. did you solve it or just eat it because the rest is so computationally heavy anyway?
honestly very valid problem, and equally valid of a solution. Would invest
thx!