One Month of Wispr: From First Release to CLI

4 min read Original article ↗

When I first wrote about building Wispr, it was a freshly minted macOS dictation app that I had built in a day and a half with Kiro. Since then, the project has taken on a life of its own. The GitHub repo crossed 70 stars, nine people forked it, and I started receiving pull requests and issues from people actually using it daily. That feedback loop has been the best part of this whole experiment.

Over the past few weeks, I shipped 19 releases. Some were bug fixes — the kind of things you only discover when real people use your app on real hardware. Others were feature requests that kept coming up in issues and conversations. Hands-free mode, audio feedback, filler word removal, auto-send Enter for chat apps, Fn key support, and now a command line tool.

Let me focus on the two latest releases.

v1.6 — The Fn/Globe Key

This one came from a simple observation: the 🌐 Fn (Globe) key on modern Macs is the most natural dictation trigger. It sits right where your thumb rests, it is not used for much by default, and Apple themselves use it for dictation in macOS. Several people asked for it.

Starting with v1.6, you can head to Settings → Shortcut and press the Fn key to assign it as your dictation hotkey.

The implementation was more interesting than I expected. The standard Carbon RegisterEventHotKey API that Wispr uses for modifier+key combos does not support the bare Fn key. So v1.6 introduces a dual-backend architecture: Carbon for standard shortcuts, and a CGEventTap for the Fn key specifically. The Fn detection uses the .function modifier flag rather than a keycode, which is the reliable approach on Apple Silicon.

One gotcha: if you have “Press 🌐 key to → Show Emoji & Symbols” enabled in macOS System Settings, it conflicts with Wispr. The app now shows a helpful tip when you select Fn, guiding you to set that option to “Do Nothing.”

This is the feature I am most excited about. Wispr now ships with a CLI tool that lets you transcribe audio and video files directly from the terminal.

The idea is simple. You have a podcast episode, a meeting recording, or a video file. You want a text transcript. Instead of opening a GUI, dragging a file, and waiting, you just run:

Output goes to stdout, diagnostics to stderr, so it pipes cleanly into other tools. Want to transcribe a video and save the result?

wispr meeting.mp4 > transcript.txt

The CLI uses the same on-device models that the GUI app manages. It supports every format that AVAssetReader handles: mp3, wav, m4a, mp4, mov, and more. No cloud, no API keys, no network required.

To install it, open the Wispr menu bar app and click “Install Command Line Tool…”. The app shows you the ln -sf command to symlink the CLI binary into /usr/local/bin/wispr. The menu item only appears when the symlink is not already in place.

Under the hood, the CLI required solving an interesting sandboxing problem. The GUI app runs inside the macOS App Sandbox, which means its downloaded models live in a sandboxed container path. The CLI runs outside the sandbox. The new ModelPaths service detects which environment it is running in and resolves the correct model directory accordingly. Both the CLI and the GUI share a new AudioFileDecoder service that handles the conversion of any supported audio or video format to 16 kHz mono Float32 PCM, which is what the transcription models expect.

Parakeet V3

Since the original blog post, Wispr also gained support for NVIDIA’s Parakeet V3 model. This is a 600-million-parameter multilingual ASR model that is significantly faster than Whisper while delivering comparable or better accuracy. It supports 25 European languages with automatic language detection — no need to tell it what language you are speaking.

Alongside Parakeet V3, there is also the Parakeet Realtime 120M model, a smaller variant optimized for low-latency streaming. This one is English-only but provides near-instant results with end-of-utterance detection. When combined with hands-free mode, you press the hotkey once, speak, and Wispr automatically detects when you have finished and inserts the text. No need to press anything to stop.

The model management UI lets you download, switch between, and delete models from a single screen. You can have both Whisper and Parakeet models installed simultaneously and switch between them depending on your needs — Whisper for maximum language coverage, Parakeet for speed.

What’s Next

The project is open source and I welcome contributions. If you have filler words for languages other than English and French, I would love to add them. If you find bugs or have feature ideas, open an issue on the GitHub repository.

Install Wispr with Homebrew:

brew tap sebsto/macos
brew install wispr

Or download the latest release from wispr.stormacq.com.

Happy coding.