Settings

Theme

Show HN: Slave – local dictation and TTS for macOS (3k words free)

slave.bot

2 points by mesadb 3 months ago · 1 comment · 1 min read

Reader

Slave is a macOS app for voice-in, voice-out.

Dictate in most languages. Types into any app.

Listen back with local Piper TTS.

3,000 words free. Then $6.99/month.

Next: joins meetings, transcribes, writes short notes. Later: lightweight Obsidian-style notes built from your text.

Built on Whisper + Piper. Runs on your machine.

Feedback on UX, speed, and pricing is welcome.

mesadbOP 3 months ago

Some implementation details, since getting this to work well was not trivial.

My goal was “press hotkey, start talking, see text within ~1–2 seconds” on an M2 MacBook Pro, and support multiple languages.

First attempts (cloud) – I tried Hugging Face real-time transcription. It worked but latency was all over the place and costs would not scale. – I tried OpenAI real-time transcription. Latency was better, but when there was background noise, it'd transcribe wrong things. Saw 200ms responses. I can bring that back if I can make it stable. – I briefly experimented with Gemini for transcribing and formatting multi-language text. Quality was not consistent enough compared to Whisper for Multi language.

Local experiments – I used FFmpeg + Whisper CLI in a bunch of ways: batching, buffering, trying to “stream” partial results out of Whisper to make it feel live. – I also tried a local Llama model to format the raw transcript into an email. On an M2 Pro this took ~2 seconds for short emails and got much slower for long text. It looked nice but the latency was not acceptable for everyday use.

Where I ended up (for now) – Current version sticks to FFmpeg + Whisper CLI locally, optimized for short chunks so you usually see text within about 1–2 seconds. – I dropped the heavy on-device LLM formatting and keep the formatting logic much simpler so it stays predictable and fast.

Next step is to re-introduce “smart” formatting and meeting notes, but only when I can do it without blowing up latency. Happy to dig deeper into any of these if people are curious.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection