9-11× realtime on your CPU. Local by default. Voxtype runs Cohere Transcribe (#1 on the Open ASR Leaderboard) faster than realtime on a plain Zen 4 CPU. Parakeet, Whisper, and five more engines if you want them. No cloud, no subscription, no telemetry.
Local by Default Open Source Wayland Optimized
$ voxtype
[INFO] Voxtype v0.7.0 starting...
[INFO] Engine: cohere (cohere-transcribe-q4f16)
[INFO] Hotkey: SCROLLLOCK
[INFO] Ready! Hold SCROLLLOCK to record.
# User holds ScrollLock and speaks for 4.75s...
[INFO] Recording started...
[INFO] Recording stopped (4.75s)
[INFO] Cohere transcription completed in 0.45s (10.6× realtime)
[INFO] Transcribed: "This is a longer test of voice activity detection with multiple words and phrases."
[INFO] Typed 86 characters
What makes Voxtype different
Built specifically for the modern Linux desktop. Fast on every machine.
Local by default
Your audio stays on your machine. No cloud, no subscription, no telemetry. Optional remote Whisper servers when you want them — never required.
Pauses your music
Auto-pauses Spotify, Plasma media players, and anything that speaks MPRIS the moment you start dictating. Resumes on release. No more accidentally dictating over a podcast.
Meeting mode
Continuous transcription with chunked processing, speaker attribution, and export to Markdown, JSON, SRT, or VTT. Optional LLM post-processing pipes transcripts through Ollama for cleanup or summarization.
Floating waveform OSD
Matches your swayosd band by default — same vertical
position as volume and brightness panels — so the level meter sits
where you already look for system feedback.
New in 0.7.0.
Seven transcription engines
Whisper · Parakeet · Moonshine · SenseVoice · Paraformer · Dolphin · Omnilingual. Switch with one config line. CJK and 1600+ languages covered by the multilingual engines.
Interactive TUI configure
voxtype configure edits every option in
~/.config/voxtype/config.toml for you — no
hand-editing TOML. Auto-downloads missing models, swaps GPU binaries via
pkexec, restarts the daemon when needed. Surfaces in
Walker, fuzzel, and rofi as “Voxtype Configuration”.
New in 0.7.0.
Cohere Transcribe at 9-11× realtime — on your CPU
Quantized to 1.5 GB (q4f16) so it loads fast and runs faster than realtime on a plain Zen 4 CPU. Punctuation, capitalization, and inverse text normalization out of the box. Sits at #1 on the Open ASR Leaderboard. New in 0.7.0.
Parakeet on AMD and NVIDIA GPUs
MIGraphX 7.2 for Radeon RX 7000 and 9000-series cards. Separate CUDA 12 and CUDA 13 binaries so every NVIDIA driver generation works. Vulkan for Whisper across vendors. MIGraphX new in 0.7.0.
Hyprland, Niri, Sway, River, GNOME, KDE
Compositor keybindings everywhere, evdev fallback for X11, Wayland-first typing via wtype with full CJK support. Falls back through dotool → ydotool → clipboard if any layer is unavailable.
Dynamic per-engine model loading
Configure all seven engines, pay memory only for the one you're actually using. Models load on first use and unload when idle so you can switch engines mid-day without restarting the daemon.
Text processing built in
Spoken punctuation ("comma" → ,),
per-user replacement tables for common mistranscriptions, and an optional
post-processing pipe through any LLM or shell script. Fix domain terms,
drop filler words, polish grammar — all without leaving voxtype.
One package on every distro
AUR (voxtype, voxtype-bin),
.deb, .rpm, Homebrew on macOS. Signed release
binaries from a reproducible Docker pipeline so what you install is what
we built. MIT licensed.
Latest news
Recent releases and what they bring
Release
v0.7.2: Streaming Dictation, Modifier-Release Guard, Notification Cleanup
Parakeet streaming types text at the cursor as you speak (toggle activation only). A new evdev-based modifier-release guard stops chord hotkeys from triggering on the first typed letter. Notifications now overwrite in place instead of stacking. Experimental aarch64 binaries land for Raspberry Pi, Ampere, and Snapdragon X.
Read more →Release
v0.7.1: NixOS source build hotfix
Moves tray-icon and rdev under cfg(target_os = "macos") so Linux builds stop pulling in the GTK3 toolchain. Two community contributions land alongside: osdNative and osdGtk4 as flake outputs, and a GTK4 OSD startup-visibility fix.
Release
v0.7.0: Cohere, macOS, on-screen visualizer, configuration TUI
Cohere as the eighth engine, full macOS support via Homebrew, GTK4 visualizer that follows swayosd convention, MIGraphX replacing ROCm on AMD, CUDA 12/13 split, and a new interactive voxtype configure TUI.
Release
v0.6.6: Media Pause, Audio Feedback, KDE Support
Auto-pause your music while dictating. Audio cues when transcription finishes. KDE Plasma compositor keybindings documented. Seven bug fixes across output drivers, text processing, and the remote backend.
Read more →See It In Action
Watch Voxtype transform voice into text
1 2 3
foot ~ ai-assistant
vol: 75% 14:32
pete@arch:~/projects/myapp$ ai-assistant
AI Coding Assistant v1.0
Enter your prompt or type /help for commands.
You:
File Edit View
# Project Notes
## Overview
|
1use std::io;
2
3fn main() {
4
5
6}
opencode main.rs 5:1
Dictate an AI Prompt
Watch as Voxtype captures speech and types a prompt directly into an AI coding assistant. Perfect for hands-free interaction with agentic tools.
Choose Your Model
Balance speed and accuracy for your needs. With GPU acceleration, even large-v3 achieves sub-second inference!
Whisper Models (English-only)
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
tiny.en |
39 MB | Quick notes, low-end hardware | ||
base.en Recommended |
142 MB | Most users | ||
small.en |
466 MB | Higher accuracy needs | ||
medium.en |
1.5 GB | Professional transcription |
Whisper Models (Multilingual)
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
tiny |
75 MB | Quick notes, any language | ||
base |
142 MB | General multilingual use | ||
small |
466 MB | Better multilingual accuracy | ||
medium |
1.5 GB | Professional multilingual | ||
large-v3 |
3.1 GB | Maximum accuracy | ||
large-v3-turbo GPU Recommended |
1.6 GB | Fast + accurate |
ONNX Engines (require ONNX binary variant)
| Engine / Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
parakeet-tdt-0.6b-v3-int8 English |
670 MB | Best English accuracy, built-in punctuation | ||
moonshine-base |
237 MB | Fastest CPU inference, English | ||
sensevoice-small CJK |
239 MB | Chinese, Japanese, Korean, Cantonese, English | ||
paraformer-zh |
487 MB | Chinese + English bilingual | ||
dolphin-base |
198 MB | 40 languages + 22 Chinese dialects | ||
omnilingual-large |
3.9 GB | 1600+ languages, rare and low-resource |
.en models are English-only but faster and more accurate for English. All ONNX engines require the ONNX binary variant. Switch with voxtype setup onnx --enable.
Installation
Get up and running in minutes
Other Linux voice typing tools require you to clone a repo, run an install script, set up a Python virtual environment, and remember to activate it every time you reboot. Voxtype is different: it's a single binary. Install it from your package manager, download a Whisper model, and enable the systemd user service. No virtual environments, no dependency conflicts, no activation scripts. It just works, every time you log in.
CPU requirement: Prebuilt Linux binaries target the x86-64-v3 baseline (AVX2 + FMA + BMI1/2).
Intel Haswell (2013+) and AMD Excavator (2015+) or any Ryzen are supported. Older CPUs need to build from source with -C target-cpu=native.
Two AUR packages: voxtype-bin (prebuilt binaries, install in seconds) and voxtype (builds from source via cargo, 20+ minutes). Most users want voxtype-bin.
# Prebuilt binaries (recommended)
paru -S voxtype-bin # or: yay -S voxtype-bin
# Or build from source
paru -S voxtype # or: yay -S voxtype
# Recommended optional dependencies
sudo pacman -S wtype wl-clipboard libnotify gtk4-layer-shell
Next: First-run setup below.
Requires Ubuntu 24.04+ or Debian Trixie+ (glibc 2.39). Older versions can build from source.
# Download and install
wget https://github.com/peteonrails/voxtype/releases/download/v0.7.2/voxtype_0.7.2-1_amd64.deb
sudo apt install ./voxtype_0.7.2-1_amd64.deb
# Recommended optional dependencies
sudo apt install wtype wl-clipboard libnotify-bin pipewire-alsa playerctl
Next: First-run setup below.
Requires Fedora 40+ (glibc 2.39).
# Download and install
wget https://github.com/peteonrails/voxtype/releases/download/v0.7.2/voxtype-0.7.2-1.x86_64.rpm
sudo dnf install ./voxtype-0.7.2-1.x86_64.rpm
# Recommended optional dependencies
sudo dnf install wtype wl-clipboard libnotify pipewire-alsa playerctl
Next: First-run setup below.
Apple Silicon (arm64) only. Uses Microsoft ONNX Runtime so all engines are available, including Parakeet on the Neural Engine path.
# Install the signed app bundle
brew install --cask voxtype
# First launch opens a setup wizard that walks you through
# Accessibility permissions, model download, and the FN-key hotkey.
Or download voxtype-0.7.2-macOS-arm64.dmg from the latest release.
NixOS users get a flake with packages for every binary variant. Pulls prebuilt ONNX Runtime; CPU-only variant builds entirely from source.
# Imperative install
nix profile install github:peteonrails/voxtype/v0.7.2#vulkan
# Available outputs: default, vulkan, cuda, rocm, osdGtk4, osdNative
nix build github:peteonrails/voxtype/v0.7.2#osdGtk4
# Or pin in your flake inputs
inputs.voxtype.url = "github:peteonrails/voxtype/v0.7.2";
Next: First-run setup below.
Self-contained binary for distros that don't have a packaged voxtype. Three variants: CPU, Vulkan GPU, and ONNX engines.
# Download a variant from the latest release
wget https://github.com/peteonrails/voxtype/releases/download/v0.7.2/voxtype-0.7.2-x86_64.AppImage
chmod +x voxtype-0.7.2-x86_64.AppImage
# Run directly, or move to ~/.local/bin/ for permanent install
./voxtype-0.7.2-x86_64.AppImage --help
mv voxtype-0.7.2-x86_64.AppImage ~/.local/bin/voxtype
Next: First-run setup below.
# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Install build dependencies
# Fedora:
sudo dnf install rust cargo alsa-lib-devel clang-devel cmake pkgconf
# Arch:
sudo pacman -S rustup alsa-lib clang cmake pkgconf
# Debian/Ubuntu:
sudo apt install cargo libasound2-dev libclang-dev cmake pkg-config
# Clone and build
git clone https://github.com/peteonrails/voxtype
cd voxtype
cargo build --release
# Install
sudo cp target/release/voxtype /usr/local/bin/
For GPU support: --features gpu-vulkan, --features gpu-cuda, or --features parakeet-migraphx. See docs/INSTALL.md for the full feature matrix.
Next: First-run setup below.
GPU acceleration. Every Linux package ships a Vulkan Whisper binary plus per-vendor ONNX engine binaries (CUDA 12, CUDA 13, MIGraphX for AMD). After install, point the wrapper at the right one:
# Auto-detect GPU and install runtime, then switch the wrapper
sudo voxtype setup gpu --enable
Runtime packages you may need: vulkan-icd-loader (Vulkan), cuda or cuda12.6 (NVIDIA), rocm-hip-runtime 7.x (AMD). Cohere, Parakeet, Moonshine, SenseVoice, Paraformer, Dolphin, and Omnilingual all run on whichever GPU EP the wrapper resolves to.
First-run setup
For Hyprland, Sway, and River users. Uses native compositor keybindings—no input group required!
# Download whisper model and configure
voxtype setup --download
# Disable built-in hotkey (we'll use compositor keybindings)
cat >> ~/.config/voxtype/config.toml << 'EOF'
[hotkey]
enabled = false
EOF
# Enable state file (required for toggle mode)
echo 'state_file = "auto"' >> ~/.config/voxtype/config.toml
# Install as systemd service
voxtype setup systemd
# Optional: Fix modifier key interference (if using SUPER+key combos)
voxtype setup compositor hyprland # or: sway
Then add keybindings to your compositor config:
# Push-to-talk: hold Super+V to record, release to transcribe
bind = SUPER, V, exec, voxtype record start
bindr = SUPER, V, exec, voxtype record stop
# Or use Scroll Lock
bind = , SCROLL_LOCK, exec, voxtype record start
bindr = , SCROLL_LOCK, exec, voxtype record stop
# Push-to-talk: hold $mod+v to record, release to transcribe
bindsym $mod+v exec voxtype record start
bindsym --release $mod+v exec voxtype record stop
# Push-to-talk: hold Super+V to record, release to transcribe
riverctl map normal Super V spawn 'voxtype record start'
riverctl map -release normal Super V spawn 'voxtype record stop'
For GNOME, KDE, and other desktops. Uses kernel-level hotkey detection.
# Add user to input group (required for hotkey detection)
sudo usermod -aG input $USER
# Log out and back in for group change to take effect
# Download whisper model and configure
voxtype setup --download
# Install as systemd service (starts on login)
voxtype setup systemd
# Check status
systemctl --user status voxtype
Works Everywhere
Tested on all major Linux desktops. Optimized for Wayland, works on X11 too.
GNOME
KDE Plasma
Sway
Hyprland
River
Any Wayland
We Want to Hear From You
Voxtype is a young project and your feedback helps make it better
Something Not Working?
If Voxtype doesn't install cleanly, doesn't work on your system, or is buggy in any way, please open an issue. I actively monitor and respond to all reports.
Like Voxtype?
A star on GitHub helps others discover the project. A vote on the AUR package increases the likelihood of inclusion in the Arch extras repository.
Ready to try Voxtype?
Start dictating on your Linux desktop today.