Turn your voice into text with a triple-tap — minimal, fast, and macOS-native.
🚀 Overview
ctrlSPEAK is your set-it-and-forget-it speech-to-text companion. Triple-tap Ctrl, speak your mind, and watch your words appear wherever your cursor blinks — effortlessly copied and pasted. Built for macOS, it's lightweight, low-overhead, and stays out of your way until you call it.
ctrlspeak-demo.mp4
✨ Features
- 🖥️ Minimal Interface: Runs quietly in the background via the command line
- ⚡ Triple-Tap Magic: Start/stop recording with a quick
Ctrltriple-tap - 📋 Auto-Paste: Text lands right where you need it, no extra clicks
- 🔊 Audio Cues: Hear when recording begins and ends
- 🍎 Mac Optimized: Harnesses Apple Silicon's MPS for blazing performance
- 🌟 Top-Tier Models: Powered by NVIDIA NeMo and OpenAI Whisper
- 📜 History Browser: Review, search, and copy past transcriptions (press
rin the UI)
🛠️ Get Started
- System: macOS 12.3+ (MPS acceleration supported)
- Python: 3.10
- Permissions:
- 🎤 Microphone (for recording)
- ⌨️ Accessibility (for shortcuts)
Grant these on first launch and you're good to go!
📦 Installation
Using Homebrew (Recommended)
# Basic installation (MLX models only) brew tap patelnav/ctrlspeak brew install ctrlspeak # Recommended: Full installation with all model support brew install ctrlspeak --with-nvidia --with-whisper # Check what models are available after installation ctrlspeak --list-models
What each option does:
--with-nvidia: Enables NVIDIA Parakeet and Canary models (recommended for best performance)--with-whisper: Enables OpenAI Whisper models (optional)
If you get "No module named 'nemo'" errors:
# Reinstall with NVIDIA support
brew reinstall ctrlspeak --with-nvidiaManual Installation
Clone the repository:
git clone https://github.com/patelnav/ctrlspeak.git
cd ctrlspeakCreate and activate a virtual environment:
# Create a virtual environment python -m venv .venv # Activate it on macOS/Linux source .venv/bin/activate
Install dependencies:
# Install core dependencies pip install -r requirements.txt # For NVIDIA model support (optional) pip install -r requirements-nvidia.txt # For Whisper model support (optional) pip install -r requirements-whisper.txt # For Cohere on Apple Silicon / MLX (optional) pip install -r requirements-cohere-mlx.txt # For Cohere reference PyTorch backend (optional) pip install -r requirements-cohere.txt
🧰 Entry Points
ctrlspeak.py: The full-featured star of the showlive_transcribe.py: Continuous transcription for testing vibestest_transcription.py: Debug or benchmark with easetest_parallel_models.py: Compare Nemotron streaming vs Parakeet side-by-side
Workflow
- Run ctrlSPEAK in a terminal window:
# If installed with Homebrew ctrlspeak # If installed manually (from the project directory with activated venv) python ctrlspeak.py
- Triple-tap Ctrl to start recording
- Speak clearly into your microphone
- Triple-tap Ctrl again to stop recording
- The transcribed text will be automatically pasted at your cursor position
UI Controls
Once running, you can use these keyboard shortcuts in the terminal UI:
r- View transcription historym- Switch speech recognition modelsd- Change audio input devicel- View logsh- Show helpq- Quit
Models
ctrlSPEAK uses open-source speech recognition models:
- Parakeet 0.6B (MLX) (default):
mlx-community/parakeet-tdt-0.6b-v3model optimized for Apple Silicon. Recommended for most users on M1/M2/M3 Macs. - Canary: NVIDIA NeMo's
nvidia/canary-1b-flashmultilingual model (En, De, Fr, Es) with punctuation, but can be slower. Requiresrequirements-nvidia.txt. - Canary (180M): NVIDIA NeMo's
nvidia/canary-180m-flashmultilingual model, smaller and less accurate. Requiresrequirements-nvidia.txt. - Whisper (optional): OpenAI's
openai/whisper-large-v3model. A fast, accurate, and powerful model that includes excellent punctuation and capitalization. Requiresrequirements-whisper.txt. - Cohere: The default Cohere shortcut maps to the Apple Silicon MLX backend. Use
cohereor the--cohereshortcut. Requiresrequirements-cohere-mlx.txt. - Cohere (reference backend): The direct PyTorch/Transformers implementation remains in the repo as a reference path for development and comparison. It runs CPU-first and requires
requirements-cohere.txt, but it is not exposed as a normal model shortcut. - Nemotron (Streaming) [Experimental]: NVIDIA's
nvidia/nemotron-speech-streaming-en-0.6bstreaming model with real-time transcription. Text appears as you speak. Requiresrequirements-nvidia.txt.
Note: The nvidia/parakeet-tdt-1.1b model is also available for testing, but it is not recommended for general use as it lacks punctuation and is slower than the 0.6b model. Requires requirements-nvidia.txt.
The models are automatically downloaded from HuggingFace the first time you use them.
Listing Supported Models
To see a list of all supported models, use the --list-models flag:
This will output a list of the available model aliases and their corresponding Hugging Face model names.
Apple Silicon (MLX) Acceleration
For users on Apple Silicon (M1/M2/M3 Macs), an optimized version of the Parakeet model is available using Apple's MLX framework. This is the default model and provides a significant performance boost.
Model Selection
You can select a model using the --model flag. You can use either the full model name from HuggingFace or a short alias.
Short Names:
parakeet: Parakeet 0.6B optimized for Apple Silicon (MLX). (Default)canary: NVIDIA's Canary 1B Flash model.canary-180m: NVIDIA's Canary 180M Flash model.whisper: OpenAI's Whisper v3 model.cohere: Cohere Transcribe on the Apple Silicon MLX backend. Recommended Cohere shortcut.nemotron: NVIDIA's Nemotron streaming model. [Experimental]
Full Model URL:
You can also provide a full model URL from Hugging Face. For example:
ctrlspeak --model nvidia/parakeet-tdt-1.1b
This will download and use the specified model.
# Using Homebrew installation ctrlspeak --model parakeet # Default ctrlspeak --model canary # Multilingual with punctuation ctrlspeak --model canary-180m # The smaller Canary model ctrlspeak --model canary-v2 ctrlspeak --model whisper # OpenAI's model ctrlspeak --cohere # Shortcut for Cohere on MLX ctrlspeak --model cohere # Cohere Transcribe on MLX ctrlspeak --model parakeet-mlx # MLX-accelerated model ctrlspeak --model nemotron # Streaming (experimental) # Using manual installation python ctrlspeak.py --model parakeet python ctrlspeak.py --model canary python ctrlspeak.py --model canary-180m python ctrlspeak.py --model canary-v2 python ctrlspeak.py --model whisper python ctrlspeak.py --cohere python ctrlspeak.py --model cohere python ctrlspeak.py --model parakeet-mlx python ctrlspeak.py --model nemotron
Transcription History
ctrlSPEAK automatically saves your transcriptions locally for later review.
History Browser
Access the interactive history browser by pressing r in the terminal UI:
- View past transcriptions - Browse all saved transcriptions with timestamps
- Copy to clipboard - Press
Enterorcto copy any previous transcription - Delete entries - Press
Deleteordto remove unwanted entries - Navigate - Use arrow keys to browse through your history
- See statistics - View total entries, word count, and recording time
Data Storage
History is stored locally in a SQLite database:
- Location:
~/.ctrlspeak/history.db - What's stored: Timestamp, transcription text, model used, duration, language
- Permissions: File is created with user-only access (700)
Privacy Controls
You have full control over your transcription history:
# Disable history saving ctrlspeak --no-history # Use custom database location ctrlspeak --history-db ~/my-custom-path/history.db # Delete all history data rm ~/.ctrlspeak/history.db
Command Line Options
ctrlspeak [OPTIONS] Options: --model MODEL Select speech recognition model (default: parakeet) --list-models Show all available models --no-history Disable transcription history saving --history-db PATH Custom path for history database --source-lang LANG Source language code (default: en) --target-lang LANG Target language code (default: en) --debug Enable debug logging --check-only Verify configuration without running --check-compatibility Check system compatibility Examples: ctrlspeak # Run with defaults ctrlspeak --model whisper # Use Whisper model ctrlspeak --no-history # Disable history ctrlspeak --history-db ~/backup/history.db # Custom DB location ctrlspeak --debug # Enable debug mode
Models Tested
- Parakeet 0.6B (NVIDIA) -
nvidia/parakeet-tdt-0.6b-v3(Default) - Parakeet 1.1B (NVIDIA) -
nvidia/parakeet-tdt-1.1b - Canary (NVIDIA) -
nvidia/canary-1b-flash - Canary (NVIDIA) -
nvidia/canary-180m-flash - Canary (NVIDIA) -
nvidia/canary-1b-v2 - Whisper (OpenAI) -
openai/whisper-large-v3 - Cohere (Apple Silicon MLX) -
CohereLabs/cohere-transcribe-03-2026 - Nemotron (NVIDIA) -
nvidia/nemotron-speech-streaming-en-0.6b[Experimental, Streaming]
Performance Comparison
| Model | Framework | Load Time (s) | Transcription Time (s) | Output Example (test.wav) |
|---|---|---|---|---|
parakeet-tdt-0.6b-v3 |
MLX (Apple Silicon) | 0.97 | 0.53 | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
| NeMo (NVIDIA) | 15.52 | 1.68 | ||
parakeet-tdt-0.6b-v2 |
MLX (Apple Silicon) | 0.99 | 0.56 | "Well, I don't wish to see it any more, observed Phebe, turning away her eyes. It is certainly very like the old portrait." |
| NeMo (NVIDIA) | 8.23 | 1.61 | ||
canary-1b-flash |
NeMo (NVIDIA) | 32.06 | 3.20 | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
canary-180m-flash |
NeMo (NVIDIA) | 6.16 | 3.20 | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
whisper-large-v3 |
Transformers (OpenAI) | 5.44 | 2.53 | "Well, I don't wish to see it any more, observed Phoebe, turning away her eyes. It is certainly very like the old portrait." |
cohere-transcribe-03-2026 |
MLX (Apple Silicon) | 1.10 | 1.29 | "Well, I don't wish to see it any more observed Phoebe turning away her eyes. It is certainly very like the old portrait." |
Testing performed on a MacBook Pro (M2 Max) with a 7-second audio file (test.wav). Your results may vary.
Note: Whisper model uses translate mode to enable proper punctuation and capitalization for English transcription.
Streaming vs Batch Tradeoffs
The Nemotron model uses real-time streaming transcription where text appears as you speak. This provides instant feedback but has accuracy tradeoffs compared to batch models like Parakeet:
- Streaming (Nemotron): Text appears incrementally during speech. Lower accuracy due to limited context - may miss or misinterpret phrases.
- Batch (Parakeet, etc.): Transcription happens after recording stops. Higher accuracy because the model has the full audio context.
For most users, Parakeet MLX (default) provides the best balance of speed and accuracy.
Permissions
The app requires:
- Microphone access (for recording audio)
- Accessibility permissions (for global keyboard shortcuts)
You'll be prompted to grant these permissions on first run.
Troubleshooting
- No sound on recording start/stop: Ensure your system volume is not muted
- Keyboard shortcuts not working: Grant accessibility permissions in System Settings
- Transcription errors: Try speaking more clearly or using the other model
Credits
Contributors
- @swanhtet1992 - Transcription history feature
Sound Effects
- Start sound: "Notification Pluck On" from Pixabay
- Stop sound: "Notification Pluck Off" from Pixabay
License
Release Process
This outlines the steps to create a new release and update the associated Homebrew tap.
1. Prepare the Release:
- Ensure the code is stable and tests pass.
- Update the version number in the following files:
VERSION(e.g.,1.2.0)__init__.py(__version__ = "1.2.0")pyproject.toml(version = "1.2.0")
- Commit these version changes:
git add VERSION __init__.py pyproject.toml git commit -m "Bump version to X.Y.Z"
2. Tag and Push:
- Create a git tag matching the version:
- Push the commits and the tag to the remote repository:
git push && git push origin vX.Y.Z
3. Update Homebrew Tap:
- The source code tarball URL is automatically generated based on the tag (usually
https://github.com/<your-username>/ctrlspeak/archive/refs/tags/vX.Y.Z.tar.gz). - Download the tarball using its URL and calculate its SHA256 checksum:
# Replace URL with the actual tarball link based on the tag curl -sL https://github.com/<your-username>/ctrlspeak/archive/refs/tags/vX.Y.Z.tar.gz | shasum -a 256
- Clone or navigate to your Homebrew tap repository (e.g.,
../homebrew-ctrlspeak). - Edit the formula file (e.g.,
Formula/ctrlspeak.rb):- Update the
urlline with the tag tarball URL. - Update the
sha256line with the checksum you calculated. - Optional: Update the
versionline if necessary (though it's often inferred). - Optional: If
requirements.txtor dependencies changed, update thedepends_onandinstallsteps accordingly.
- Update the
- Commit and push the changes in the tap repository:
cd ../path/to/homebrew-ctrlspeak # Or wherever your tap repo is git add Formula/ctrlspeak.rb git commit -m "Update ctrlspeak to vX.Y.Z" git push
4. Verify (Optional):
- Run
brew updatelocally to fetch the updated formula. - Run
brew upgrade ctrlspeakto install the new version. - Test the installed version.