Show HN: Cartoon Studio – an open-source desktop app for making 2D cartoon shows
github.comHi HN — I built Cartoon Studio, an open-source desktop app for making simple 2D cartoon scenes and shows.
The basic flow is: place SVG characters on a scene, write dialogue, pick voices, and render to MP4. It handles word timestamps, mouth cues, and lip-sync automatically.
This started as me playing around with Jellypod's Speech SDK and HeyGen's HyperFrames. I wanted a small tool that could go from script to video without a big animation pipeline and next thing I knew I was trying to create my own South Park style show and here we are. :D
A few details:
- desktop app built with Electron
- supports multiple TTS providers through Jellypod's Speech SDK
- renders via HyperFrames
- lets you upload or generate characters and backdrop scenes
- includes default characters/scenes so you can try it quickly
- open source
It runs from source today. AI features use bring-your-own API keys, but the app itself is fully inspectable and local-first in the sense that there’s no hosted backend or telemetry.
Here are some fun examples of the the types of videos you can create:
https://x.com/deepwhitman/status/2046425875789631701
https://x.com/deepwhitman/status/2047040471579697512
And the repo:
https://github.com/Jellypod-Inc/cartoon-studio
Happy to answer questions and appreciate any feedback! The LLM/deterministic split is the smart call here. You can iterate on a script without the rest of the pipeline drifting under you. Curious how far the vowel-per-word heuristic holds before you wish you had Rhubarb, but "regenerates instantly" sounds like the right tradeoff for a studio loop. This looks great. Curious about the lip-sync — viseme set or just
open/closed mouths? The South Park style is super forgiving but
HyperFrames quality seems like it'd need more. Very cool! I will definitely try this out - cartoons is something I have been interested in for a while. Will check it out. static video with text2speech audio and two circles moving representing the mouths: "OMG I might have a show on my hands " I went into this imagining something like Synfig Studio (https://www.synfig.org/) or Moho (https://moho.lostmarble.com/). "Studio" here is quite far from what it actually is: lip-syncing in static characters. Also, Moho offers far more comprehensive (and comprehensible!) lip-sync: https://lostmarble.com/papagayo/ I get that you're using AI to boost capability with less effort, but at the moment, I think the more specialized tools are still a better avenue for this. Lastly, I followed the link to Jellypod (https://www.jellypod.com/). It's pretty good, but falls into a vocal "uncanny valley". Even a human reading from a script wouldn't sound that perfect; the enunciations immediately come across as artificial. Now, if this was an extension to Synfig (also open source!), it would be a much more interesting venture...