Now available for macOS
Yes, another voice-to-text app.
Here's why this one exists.
- 50 voice-to-text apps. The same broken tradeoff.
- Cloud apps screenshot your screen. Their "local mode" is an afterthought.
- Local apps protect privacy. Then hand you a raw transcript to clean up.
- You shouldn't have to pick.
~500ms from shortcut to formatted text. Same loop, two backends:
- Local mode (WhisperKit). Same formatting quality as cloud.
- Cloud mode (ElevenLabs streaming, Groq as fallback). Audio not stored, not used for training.
- No screenshots. Active app detected via macOS APIs.
- 100+ languages in cloud, 14+ local. Formatting works in all of them, with
- Native transcription (cloud or local), or
- Translation to English (cloud only, Pro)
⌥Space One shortcut. Talk. Done. Feel the difference instantly.
Free forever. Plus 7 days of Pro on us, no card.
“A five-minute chore now takes about ten seconds.”
~500ms end-to-end · macOS 13+ · Apple Silicon & Intel · 2,000 words/week free
💻
Local matches cloud quality
🌍
100+ languages, all formatted
The gap nobody fills
Every app makes you choose
"Great formatting OR privacy. Pick one."
Cloud apps (Wispr Flow, Aqua Voice) format well. The cost: some screenshot your screen every few seconds for "context," and the local mode is either missing or noticeably worse than cloud.
Local apps (SuperWhisper, TalkFlowy) keep your audio on-device. The cost: raw transcripts you punctuate and reformat by hand. Local quality lags cloud.
Shoute resolves the tradeoff. Cloud mode streams audio to ElevenLabs for sub-second transcription. Local mode runs WhisperKit on-device, with the same formatting model. Both produce identical context-aware output. Neither one ever reads your screen.
What makes Shoute different
Three things no competitor does together
Lots of apps do one of these. None do all three.
🛡
Privacy without compromise
No screen-capture permission requested - we don't take screenshots. Cloud mode streams audio to ElevenLabs (Groq as fallback) and stores nothing. Local mode keeps audio on your Mac, period. You pick which one runs.
✅
Local model. Cloud-grade output.
Most "local modes" are an obvious downgrade. Ours runs WhisperKit on Apple Silicon and tunes the formatting model for dictation, not generic chat. Same clean punctuation, same context-aware structure, same ~500ms loop as cloud. Free tier lets you A/B both.
🎯
Formats to the app you're in
Same words, different output. Casual in Slack. Greeting and sign-off in Mail. Checkboxes in Reminders. Paragraph in Notes. Shoute reads the active app's name through the macOS Accessibility API - no screen capture, no settings to toggle.
Context-aware formatting
Same voice. Different format.
Same dictation style, four destinations, four formats. No setting toggled. No screen captured.
You said
"hey can you push the standup to 3 today um something came up with the client"
Shoute output
Hey, can you push the standup to 3 today? Something came up with the client.
You said
"hey sarah thanks for the proposal let's schedule a call this week to go over next steps does thursday afternoon work"
Shoute output
Hi Sarah,
Thanks for sending over the proposal. I'd like to schedule a call this week to discuss next steps. Does Thursday afternoon work for you?
Best regards
You said
"pick up dry cleaning get almond milk call the dentist about tuesday and order avi's birthday present"
Shoute output
Call the dentist about Tuesday
Order Avi's birthday present
You said
"the main issue with the current approach is that we're triggering the photo evaluation too early um users haven't uploaded enough photos yet so the results aren't meaningful"
Shoute output
The main issue with the current approach is that we're triggering the photo evaluation too early. Users haven't uploaded enough photos yet, so the results aren't meaningful.
100+ languages
Speak your language. Get formatted text.
Most apps add multilingual transcription, then only format well in English. Shoute's formatting works in every language it transcribes. Dictate in Tamil, get a proper Mail email. Spanish in Slack? Punctuated and casual. Or set the output to English and Shoute translates as it formats — speak any language, paste polished English (Pro, cloud-only).
🇺🇸 English
🇨🇳 Chinese
🇮🇳 Hindi
🇪🇸 Spanish
🇸🇦 Arabic
🇫🇷 French
🇵🇹 Portuguese
🇷🇺 Russian
🇯🇵 Japanese
🇩🇪 German
🇰🇷 Korean
🇮🇳 Tamil
+ 88 more, auto-detected
You said
"oye puedes mover la reunión a las tres de la tarde es que me surgió algo con el cliente"
Shoute output
Oye, ¿puedes mover la reunión a las 3 de la tarde? Me surgió algo con el cliente.
You said
"vanakkam sir report ready aayiduchi naalaikku meeting la discuss pannalaam"
Shoute output
வணக்கம் Sir,
Report தயாராகிவிட்டது. நாளைக்கு meeting-ல் discuss பண்ணலாம்.
நன்றி
You said
"das hauptproblem ist dass wir die auswertung zu früh starten ähm die nutzer haben noch nicht genug daten hochgeladen"
Shoute output
Das Hauptproblem ist, dass wir die Auswertung zu früh starten. Die Nutzer haben noch nicht genug Daten hochgeladen.
You said
"sumimasen kyou no meeting san ji ni henkou dekimasuka chotto kyaku no ken de"
Shoute output
すみません、今日のミーティング3時に変更できますか?ちょっと客の件で。
You said
"deployment 3 maniku finish aagum, after that we can start the demo"
Shoute output (translated)
Deployment will finish at 3. After that we can start the demo.
Multilingual support in most apps stops at raw transcription - the formatting intelligence is English-only. Shoute formats every language it transcribes. Checklist in Reminders, formal in Mail, casual in Slack, no matter which language you spoke it in. Need English out? Flip one toggle and Shoute translates while it formats (Pro, cloud-only).
Privacy
What "privacy-first" actually looks like
Every voice app calls itself "privacy-first." Here's what theirs do vs. what ours does.
How most voice apps work
Screenshots, no real local option
✗ Audio retention policies vague - some train models on your voice data
✗ Screen captured every few seconds for "context awareness" (Wispr Flow does this)
✗ Local mode exists but ships an obviously worse formatter
✗ Transcription content fed into product analytics
✗ "Your data may improve our models" - opted in by default
How Shoute works
Private by architecture
✓ Two modes, your pick. Cloud streams audio to ElevenLabs (Groq as fallback). Local runs WhisperKit on-device, zero network calls.
✓ No screenshots. Ever. Active app name comes from the macOS Accessibility API, not pixels on your screen.
✓ Local output matches cloud - same formatting model, same ~500ms loop. The free tier lets you compare both.
✓ Cloud audio is never stored, never logged, never used for training. The transcription provider sees the stream once and discards it.
✓ Forward Alpha is a two-person indie studio. No VC, no investor pressure to harvest data.
How it works
Three steps. One shortcut.
No app to switch to. No copy, no paste. Text just appears.
1
Press one shortcut
From any app, any text field. No window to bring forward, no field to focus.
⌥ Option + Space
2
Speak naturally
Ramble. Use filler words. Change your mind mid-sentence. The formatter strips the "ums" and the false starts before you see anything.
3
Text appears at your cursor
Formatted for the app you were in: casual in Slack, structured in Mail, checkbox list in Reminders. Typically ~500ms from release to text on screen.
Honest comparison
How Shoute stacks up - no spin
We respect every product on this list. Here's the honest read - including where they're still ahead of us.
| App | No Screenshots | Local = Cloud | Smart Format | Multi-Language | Price |
|---|---|---|---|---|---|
| Shoute | ✓ Yes | ✓ Yes | Per-app context | 100+ | $5.83/mo |
| Wispr Flow | ✗ Takes screenshots | Cloud only | Context-aware | 100+ | $15/mo |
| Aqua Voice | Unknown | Cloud only | Prose polish | Multi | $8-10/mo |
| SuperWhisper | ✓ Yes | Local is worse | Basic | Multi | $249 lifetime |
| TalkFlowy | ✓ Yes | Local only | Raw transcript | 50+ | One-time |
| Sayline | ✓ Yes | Local only | Grammar only | Multi | One-time |
Pricing
Start free. Upgrade when you're hooked.
No credit card. No signup wall. Free tier is 2,000 words a week - enough to know within a day whether voice-to-text changes how you work.
Free
Get started, no card required
$0
Free forever
- Cloud-powered transcription
- AI smart formatting
- Works in every app
- Audio never stored or used for training
- 2,000 words / week
- 1 device
You'll start with Shoute Pro free for 7 days
Pro
Unlimited, cloud + local
$5.83 /mo
Save 17% vs monthly
Billed $69.99/year
- Everything in Free
- Unlimited transcriptions
- Cloud + local transcription
- Context-aware formatting
- Early access to new features
- Priority support
- 2 devices
Local
100% offline, pay once
$49.99
One-time purchase, yours forever
- On-device transcription only
- Nothing leaves your computer
- Works fully offline
- Apple Silicon optimized
- All future updates
- 2 devices
Shoute Teams
Volume pricing - the more seats, the less per seat.
3 seats $4.00/seat/mo
4–9 seats $3.75/seat/mo
10+ seats $3.33/seat/mo
$4.00 /seat/mo
$12.00/mo for 3 seats · Billed $47.99/seat/yr
- Everything in Pro, plus:
- 2 devices per seat
- Centralized billing
- Admin dashboard
- Per-seat license keys
FAQ
Questions you're probably asking
Is the local model really as good as cloud?
For dictation formatting on Apple Silicon, yes. We run WhisperKit for transcription and a formatting model tuned for dictation - not a general-purpose LLM crammed into a small footprint. Output and ~500ms latency match cloud. Don't take our word for it - the free tier lets you A/B both modes.
What do you mean "no screenshots"? Why would a voice app take screenshots?
Some voice-to-text apps capture your screen periodically to understand what you're working on - this is how they provide "context-aware" formatting. Shoute takes a different approach: we detect the frontmost app name (e.g., "Mail" or "Slack") through the macOS Accessibility API. Same formatting intelligence, zero screen capture.
How does context-aware formatting work?
When you trigger Shoute, it checks which app is active. Slack? Output is casual - lowercase greeting, no sign-off. Mail? Proper email structure with greeting and closing. Reminders? Checklist format. Notes? Clean paragraphs. The AI formatting model adjusts its output based on where your text will land.
What languages are supported?
100+ languages, and the formatting intelligence works across all of them. You can dictate in Tamil, Spanish, German, Japanese, Hindi, or Arabic and get properly formatted output - not just raw transcription. You can even switch languages mid-conversation.
What happens if you shut down? Does the app stop working?
The Local plan runs entirely on-device, so it keeps working regardless. Cloud features need our servers - the local option exists precisely so you're never locked in. We're Forward Alpha, a two-person studio that uses Shoute every day; this isn't a launch-and-pivot play.
I can just use Apple's built-in Dictation. Why pay?
Apple Dictation times out after 60 seconds, doesn't format anything, can't tell the difference between a Slack message and an email, and outputs one continuous sentence with no punctuation or structure. Try dictating a grocery list - you'll get a single run-on sentence. Shoute gives you a checklist. That's the gap.
Who's behind this?
Forward Alpha - a two-person indie studio. We build tools we want to use ourselves. No VC funding, no investor pressure to harvest your data, no growth-at-all-costs playbook. Just a product we're proud of and use every single day.
What people say
Quiet wins, in their words
No incentives, no scripts. Just what people told us after switching.
"I talk way faster than I type, and my inbox finally proves it. What used to be a five minute chore now takes about ten seconds, and honestly it reads better than what I would have typed."

"I just talk and clean text comes out. I barely edit anymore."

"Half my day was writing up meeting notes. Now I talk them out and they are formatted before the next call starts."
"I was sure I would try it for a day and forget about it, like every other tool I download. But it works in everything, Slack, Gmail, my notes, even the random text box I am in right now, and the words just land where my cursor is. A week later and I reach for the shortcut before the keyboard."
"Installed it on a whim during a busy afternoon. Now I cannot go back to typing."
"I figured it would choke on finance jargon. It does not. It gets the terms right every single time."

"English is not my first language, and it still gets me right almost every single time. I use it for pretty much everything I write now, even the tricky technical stuff."
"My wrists used to ache by Friday. Now I do standup notes, PR descriptions, and long replies entirely by talking, and it comes out as clean paragraphs instead of one giant run-on. First tool in a long time that changed how I work, not just where I click."
"Tickets, replies, quick docs, all by voice now. Faster, and my hands stopped hurting."
Try it free. The difference is obvious
on the first dictation.
2,000 words a week, free. No credit card. No signup wall on the first session.
Resend License Key
Enter the email you used to purchase. We'll resend your license key.
Delete My Data
Enter your email to receive a confirmation link. Clicking it will permanently delete all your data.