Shoute - Voice to Text That Respects Your Privacy

11 min read Original article ↗

Now available for macOS

Yes, another voice-to-text app.
Here's why this one exists.

  • 50 voice-to-text apps. The same broken tradeoff.
  • Cloud apps screenshot your screen. Their "local mode" is an afterthought.
  • Local apps protect privacy. Then hand you a raw transcript to clean up.
  • You shouldn't have to pick.

~500ms from shortcut to formatted text. Same loop, two backends:

  • Local mode (WhisperKit). Same formatting quality as cloud.
  • Cloud mode (ElevenLabs streaming, Groq as fallback). Audio not stored, not used for training.
  • No screenshots. Active app detected via macOS APIs.
  • 100+ languages in cloud, 14+ local. Formatting works in all of them, with
    • Native transcription (cloud or local), or
    • Translation to English (cloud only, Pro)

Space One shortcut. Talk. Done. Feel the difference instantly.

Free forever. Plus 7 days of Pro on us, no card.

“A five-minute chore now takes about ten seconds.”

~500ms end-to-end · macOS 13+ · Apple Silicon & Intel · 2,000 words/week free

💻

Local matches cloud quality

🌍

100+ languages, all formatted

The gap nobody fills

Every app makes you choose

"Great formatting OR privacy. Pick one."

Cloud apps (Wispr Flow, Aqua Voice) format well. The cost: some screenshot your screen every few seconds for "context," and the local mode is either missing or noticeably worse than cloud.

Local apps (SuperWhisper, TalkFlowy) keep your audio on-device. The cost: raw transcripts you punctuate and reformat by hand. Local quality lags cloud.

Shoute resolves the tradeoff. Cloud mode streams audio to ElevenLabs for sub-second transcription. Local mode runs WhisperKit on-device, with the same formatting model. Both produce identical context-aware output. Neither one ever reads your screen.

What makes Shoute different

Three things no competitor does together

Lots of apps do one of these. None do all three.

🛡

Privacy without compromise

No screen-capture permission requested - we don't take screenshots. Cloud mode streams audio to ElevenLabs (Groq as fallback) and stores nothing. Local mode keeps audio on your Mac, period. You pick which one runs.

Local model. Cloud-grade output.

Most "local modes" are an obvious downgrade. Ours runs WhisperKit on Apple Silicon and tunes the formatting model for dictation, not generic chat. Same clean punctuation, same context-aware structure, same ~500ms loop as cloud. Free tier lets you A/B both.

🎯

Formats to the app you're in

Same words, different output. Casual in Slack. Greeting and sign-off in Mail. Checkboxes in Reminders. Paragraph in Notes. Shoute reads the active app's name through the macOS Accessibility API - no screen capture, no settings to toggle.

Context-aware formatting

Same voice. Different format.

Same dictation style, four destinations, four formats. No setting toggled. No screen captured.

You said

"hey can you push the standup to 3 today um something came up with the client"

Shoute output

Hey, can you push the standup to 3 today? Something came up with the client.

You said

"hey sarah thanks for the proposal let's schedule a call this week to go over next steps does thursday afternoon work"

Shoute output

Hi Sarah,

Thanks for sending over the proposal. I'd like to schedule a call this week to discuss next steps. Does Thursday afternoon work for you?

Best regards

You said

"pick up dry cleaning get almond milk call the dentist about tuesday and order avi's birthday present"

Shoute output

Call the dentist about Tuesday

Order Avi's birthday present

You said

"the main issue with the current approach is that we're triggering the photo evaluation too early um users haven't uploaded enough photos yet so the results aren't meaningful"

Shoute output

The main issue with the current approach is that we're triggering the photo evaluation too early. Users haven't uploaded enough photos yet, so the results aren't meaningful.

100+ languages

Speak your language. Get formatted text.

Most apps add multilingual transcription, then only format well in English. Shoute's formatting works in every language it transcribes. Dictate in Tamil, get a proper Mail email. Spanish in Slack? Punctuated and casual. Or set the output to English and Shoute translates as it formats — speak any language, paste polished English (Pro, cloud-only).

🇺🇸 English

🇨🇳 Chinese

🇮🇳 Hindi

🇪🇸 Spanish

🇸🇦 Arabic

🇫🇷 French

🇵🇹 Portuguese

🇷🇺 Russian

🇯🇵 Japanese

🇩🇪 German

🇰🇷 Korean

🇮🇳 Tamil

+ 88 more, auto-detected

You said

"oye puedes mover la reunión a las tres de la tarde es que me surgió algo con el cliente"

Shoute output

Oye, ¿puedes mover la reunión a las 3 de la tarde? Me surgió algo con el cliente.

You said

"vanakkam sir report ready aayiduchi naalaikku meeting la discuss pannalaam"

Shoute output

வணக்கம் Sir,

Report தயாராகிவிட்டது. நாளைக்கு meeting-ல் discuss பண்ணலாம்.

நன்றி

You said

"das hauptproblem ist dass wir die auswertung zu früh starten ähm die nutzer haben noch nicht genug daten hochgeladen"

Shoute output

Das Hauptproblem ist, dass wir die Auswertung zu früh starten. Die Nutzer haben noch nicht genug Daten hochgeladen.

You said

"sumimasen kyou no meeting san ji ni henkou dekimasuka chotto kyaku no ken de"

Shoute output

すみません、今日のミーティング3時に変更できますか?ちょっと客の件で。

You said

"deployment 3 maniku finish aagum, after that we can start the demo"

Shoute output (translated)

Deployment will finish at 3. After that we can start the demo.

Multilingual support in most apps stops at raw transcription - the formatting intelligence is English-only. Shoute formats every language it transcribes. Checklist in Reminders, formal in Mail, casual in Slack, no matter which language you spoke it in. Need English out? Flip one toggle and Shoute translates while it formats (Pro, cloud-only).

Privacy

What "privacy-first" actually looks like

Every voice app calls itself "privacy-first." Here's what theirs do vs. what ours does.

How most voice apps work

Screenshots, no real local option

Audio retention policies vague - some train models on your voice data

Screen captured every few seconds for "context awareness" (Wispr Flow does this)

Local mode exists but ships an obviously worse formatter

Transcription content fed into product analytics

"Your data may improve our models" - opted in by default

How Shoute works

Private by architecture

Two modes, your pick. Cloud streams audio to ElevenLabs (Groq as fallback). Local runs WhisperKit on-device, zero network calls.

No screenshots. Ever. Active app name comes from the macOS Accessibility API, not pixels on your screen.

Local output matches cloud - same formatting model, same ~500ms loop. The free tier lets you compare both.

Cloud audio is never stored, never logged, never used for training. The transcription provider sees the stream once and discards it.

Forward Alpha is a two-person indie studio. No VC, no investor pressure to harvest data.

How it works

Three steps. One shortcut.

No app to switch to. No copy, no paste. Text just appears.

1

Press one shortcut

From any app, any text field. No window to bring forward, no field to focus.

⌥ Option + Space

2

Speak naturally

Ramble. Use filler words. Change your mind mid-sentence. The formatter strips the "ums" and the false starts before you see anything.

3

Text appears at your cursor

Formatted for the app you were in: casual in Slack, structured in Mail, checkbox list in Reminders. Typically ~500ms from release to text on screen.

Honest comparison

How Shoute stacks up - no spin

We respect every product on this list. Here's the honest read - including where they're still ahead of us.

App No Screenshots Local = Cloud Smart Format Multi-Language Price
Shoute ✓ Yes ✓ Yes Per-app context 100+ $5.83/mo
Wispr Flow ✗ Takes screenshots Cloud only Context-aware 100+ $15/mo
Aqua Voice Unknown Cloud only Prose polish Multi $8-10/mo
SuperWhisper ✓ Yes Local is worse Basic Multi $249 lifetime
TalkFlowy ✓ Yes Local only Raw transcript 50+ One-time
Sayline ✓ Yes Local only Grammar only Multi One-time

Pricing

Start free. Upgrade when you're hooked.

No credit card. No signup wall. Free tier is 2,000 words a week - enough to know within a day whether voice-to-text changes how you work.

Free

Get started, no card required

$0

Free forever

  • Cloud-powered transcription
  • AI smart formatting
  • Works in every app
  • Audio never stored or used for training
  • 2,000 words / week
  • 1 device

You'll start with Shoute Pro free for 7 days

Download Free

Pro

Unlimited, cloud + local

$5.83 /mo

Save 17% vs monthly

Billed $69.99/year

  • Everything in Free
  • Unlimited transcriptions
  • Cloud + local transcription
  • Context-aware formatting
  • Early access to new features
  • Priority support
  • 2 devices

Local

100% offline, pay once

$49.99

One-time purchase, yours forever

  • On-device transcription only
  • Nothing leaves your computer
  • Works fully offline
  • Apple Silicon optimized
  • All future updates
  • 2 devices

Shoute Teams

Volume pricing - the more seats, the less per seat.

3 seats $4.00/seat/mo

4–9 seats $3.75/seat/mo

10+ seats $3.33/seat/mo

$4.00 /seat/mo

$12.00/mo for 3 seats · Billed $47.99/seat/yr

  • Everything in Pro, plus:
  • 2 devices per seat
  • Centralized billing
  • Admin dashboard
  • Per-seat license keys

FAQ

Questions you're probably asking

Is the local model really as good as cloud?

For dictation formatting on Apple Silicon, yes. We run WhisperKit for transcription and a formatting model tuned for dictation - not a general-purpose LLM crammed into a small footprint. Output and ~500ms latency match cloud. Don't take our word for it - the free tier lets you A/B both modes.

What do you mean "no screenshots"? Why would a voice app take screenshots?

Some voice-to-text apps capture your screen periodically to understand what you're working on - this is how they provide "context-aware" formatting. Shoute takes a different approach: we detect the frontmost app name (e.g., "Mail" or "Slack") through the macOS Accessibility API. Same formatting intelligence, zero screen capture.

How does context-aware formatting work?

When you trigger Shoute, it checks which app is active. Slack? Output is casual - lowercase greeting, no sign-off. Mail? Proper email structure with greeting and closing. Reminders? Checklist format. Notes? Clean paragraphs. The AI formatting model adjusts its output based on where your text will land.

What languages are supported?

100+ languages, and the formatting intelligence works across all of them. You can dictate in Tamil, Spanish, German, Japanese, Hindi, or Arabic and get properly formatted output - not just raw transcription. You can even switch languages mid-conversation.

What happens if you shut down? Does the app stop working?

The Local plan runs entirely on-device, so it keeps working regardless. Cloud features need our servers - the local option exists precisely so you're never locked in. We're Forward Alpha, a two-person studio that uses Shoute every day; this isn't a launch-and-pivot play.

I can just use Apple's built-in Dictation. Why pay?

Apple Dictation times out after 60 seconds, doesn't format anything, can't tell the difference between a Slack message and an email, and outputs one continuous sentence with no punctuation or structure. Try dictating a grocery list - you'll get a single run-on sentence. Shoute gives you a checklist. That's the gap.

Who's behind this?

Forward Alpha - a two-person indie studio. We build tools we want to use ourselves. No VC funding, no investor pressure to harvest your data, no growth-at-all-costs playbook. Just a product we're proud of and use every single day.

What people say

Quiet wins, in their words

No incentives, no scripts. Just what people told us after switching.

"I talk way faster than I type, and my inbox finally proves it. What used to be a five minute chore now takes about ten seconds, and honestly it reads better than what I would have typed."
Eric·Warner Bros. Pictures
"I just talk and clean text comes out. I barely edit anymore."
Jerry·Hotwire
"Half my day was writing up meeting notes. Now I talk them out and they are formatted before the next call starts."
Vicky·Cisco
"I was sure I would try it for a day and forget about it, like every other tool I download. But it works in everything, Slack, Gmail, my notes, even the random text box I am in right now, and the words just land where my cursor is. A week later and I reach for the shortcut before the keyboard."
Jannelle·Real Estate
"Installed it on a whim during a busy afternoon. Now I cannot go back to typing."
Gaurang·Tinder
"I figured it would choke on finance jargon. It does not. It gets the terms right every single time."
Satya·Fidelity
"English is not my first language, and it still gets me right almost every single time. I use it for pretty much everything I write now, even the tricky technical stuff."
Latha
"My wrists used to ache by Friday. Now I do standup notes, PR descriptions, and long replies entirely by talking, and it comes out as clean paragraphs instead of one giant run-on. First tool in a long time that changed how I work, not just where I click."
Iman·Tinder
"Tickets, replies, quick docs, all by voice now. Faster, and my hands stopped hurting."
Amsa·Lenovo

Try it free. The difference is obvious
on the first dictation.

2,000 words a week, free. No credit card. No signup wall on the first session.

Mac

Free · ~41 MB

macOS 13 Ventura or later
Universal (Apple Silicon & Intel)

Download for Mac

Resend License Key

Enter the email you used to purchase. We'll resend your license key.

Delete My Data

Enter your email to receive a confirmation link. Clicking it will permanently delete all your data.