Voxtype - Voice to Text for Linux and macOS

9-11× realtime on your CPU. Local by default. Voxtype runs Cohere Transcribe (#1 on the Open ASR Leaderboard) faster than realtime on a plain Zen 4 CPU. Parakeet, Whisper, and five more engines if you want them. No cloud, no subscription, no telemetry.

Local by Default Open Source Wayland Optimized

$ voxtype
[INFO] Voxtype v0.7.0 starting...
[INFO] Engine: cohere (cohere-transcribe-q4f16)
[INFO] Hotkey: SCROLLLOCK
[INFO] Ready! Hold SCROLLLOCK to record.

# User holds ScrollLock and speaks for 4.75s...
[INFO] Recording started...
[INFO] Recording stopped (4.75s)
[INFO] Cohere transcription completed in 0.45s (10.6× realtime)
[INFO] Transcribed: "This is a longer test of voice activity detection with multiple words and phrases."
[INFO] Typed 86 characters

What makes Voxtype different

Built specifically for the modern Linux desktop. Fast on every machine.

Local by default

Your audio stays on your machine. No cloud, no subscription, no telemetry. Optional remote Whisper servers when you want them — never required.

Pauses your music

Auto-pauses Spotify, Plasma media players, and anything that speaks MPRIS the moment you start dictating. Resumes on release. No more accidentally dictating over a podcast.

Meeting mode

Continuous transcription with chunked processing, speaker attribution, and export to Markdown, JSON, SRT, or VTT. Optional LLM post-processing pipes transcripts through Ollama for cleanup or summarization.

Floating waveform OSD

Matches your swayosd band by default — same vertical position as volume and brightness panels — so the level meter sits where you already look for system feedback. New in 0.7.0.

Seven transcription engines

Whisper · Parakeet · Moonshine · SenseVoice · Paraformer · Dolphin · Omnilingual. Switch with one config line. CJK and 1600+ languages covered by the multilingual engines.

Interactive TUI configure

voxtype configure edits every option in ~/.config/voxtype/config.toml for you — no hand-editing TOML. Auto-downloads missing models, swaps GPU binaries via pkexec, restarts the daemon when needed. Surfaces in Walker, fuzzel, and rofi as “Voxtype Configuration”. New in 0.7.0.

Cohere Transcribe at 9-11× realtime — on your CPU

Quantized to 1.5 GB (q4f16) so it loads fast and runs faster than realtime on a plain Zen 4 CPU. Punctuation, capitalization, and inverse text normalization out of the box. Sits at #1 on the Open ASR Leaderboard. New in 0.7.0.

Parakeet on AMD and NVIDIA GPUs

MIGraphX 7.2 for Radeon RX 7000 and 9000-series cards. Separate CUDA 12 and CUDA 13 binaries so every NVIDIA driver generation works. Vulkan for Whisper across vendors. MIGraphX new in 0.7.0.

Hyprland, Niri, Sway, River, GNOME, KDE

Compositor keybindings everywhere, evdev fallback for X11, Wayland-first typing via wtype with full CJK support. Falls back through dotool → ydotool → clipboard if any layer is unavailable.

Dynamic per-engine model loading

Configure all seven engines, pay memory only for the one you're actually using. Models load on first use and unload when idle so you can switch engines mid-day without restarting the daemon.

Text processing built in

Spoken punctuation ("comma" → ,), per-user replacement tables for common mistranscriptions, and an optional post-processing pipe through any LLM or shell script. Fix domain terms, drop filler words, polish grammar — all without leaving voxtype.

One package on every distro

AUR (voxtype, voxtype-bin), .deb, .rpm, Homebrew on macOS. Signed release binaries from a reproducible Docker pipeline so what you install is what we built. MIT licensed.

Latest news

Recent releases and what they bring

May 17, 2026 Release

v0.7.2: Streaming Dictation, Modifier-Release Guard, Notification Cleanup

Parakeet streaming types text at the cursor as you speak (toggle activation only). A new evdev-based modifier-release guard stops chord hotkeys from triggering on the first typed letter. Notifications now overwrite in place instead of stacking. Experimental aarch64 binaries land for Raspberry Pi, Ampere, and Snapdragon X.

May 11, 2026 Release

v0.7.1: NixOS source build hotfix

Moves tray-icon and rdev under cfg(target_os = "macos") so Linux builds stop pulling in the GTK3 toolchain. Two community contributions land alongside: osdNative and osdGtk4 as flake outputs, and a GTK4 OSD startup-visibility fix.

May 9, 2026 Release

v0.7.0: Cohere, macOS, on-screen visualizer, configuration TUI

Cohere as the eighth engine, full macOS support via Homebrew, GTK4 visualizer that follows swayosd convention, MIGraphX replacing ROCm on AMD, CUDA 12/13 split, and a new interactive voxtype configure TUI.

April 18, 2026 Release

v0.6.6: Media Pause, Audio Feedback, KDE Support

Auto-pause your music while dictating. Audio cues when transcription finishes. KDE Plasma compositor keybindings documented. Seven bug fixes across output drivers, text processing, and the remote backend.

See It In Action

Watch Voxtype transform voice into text

Voxtype on Omarchy

Video courtesy of Omarchy, Basecamp, and DHH.

1 2 3

foot ~ ai-assistant

vol: 75% 14:32

pete@arch:~/projects/myapp$ ai-assistant

AI Coding Assistant v1.0

Enter your prompt or type /help for commands.

You:

File Edit View

# Project Notes

## Overview

1use std::io;

3fn main() {

opencode main.rs 5:1

Dictate an AI Prompt

Watch as Voxtype captures speech and types a prompt directly into an AI coding assistant. Perfect for hands-free interaction with agentic tools.

Choose Your Model

Balance speed and accuracy for your needs. With GPU acceleration, even large-v3 achieves sub-second inference!

Whisper Models (English-only)

Model	Size	Best For
`tiny.en`	39 MB	Quick notes, low-end hardware
`base.en` Recommended	142 MB	Most users
`small.en`	466 MB	Higher accuracy needs
`medium.en`	1.5 GB	Professional transcription

Whisper Models (Multilingual)

Model	Size	Best For
`tiny`	75 MB	Quick notes, any language
`base`	142 MB	General multilingual use
`small`	466 MB	Better multilingual accuracy
`medium`	1.5 GB	Professional multilingual
`large-v3`	3.1 GB	Maximum accuracy
`large-v3-turbo` GPU Recommended	1.6 GB	Fast + accurate

ONNX Engines (require ONNX binary variant)

Engine / Model	Size	Best For
`parakeet-tdt-0.6b-v3-int8` English	670 MB	Best English accuracy, built-in punctuation
`moonshine-base`	237 MB	Fastest CPU inference, English
`sensevoice-small` CJK	239 MB	Chinese, Japanese, Korean, Cantonese, English
`paraformer-zh`	487 MB	Chinese + English bilingual
`dolphin-base`	198 MB	40 languages + 22 Chinese dialects
`omnilingual-large`	3.9 GB	1600+ languages, rare and low-resource

.en models are English-only but faster and more accurate for English. All ONNX engines require the ONNX binary variant. Switch with voxtype setup onnx --enable.

Installation

Get up and running in minutes

Other Linux voice typing tools require you to clone a repo, run an install script, set up a Python virtual environment, and remember to activate it every time you reboot. Voxtype is different: it's a single binary. Install it from your package manager, download a Whisper model, and enable the systemd user service. No virtual environments, no dependency conflicts, no activation scripts. It just works, every time you log in.

CPU requirement: Prebuilt Linux binaries target the x86-64-v3 baseline (AVX2 + FMA + BMI1/2). Intel Haswell (2013+) and AMD Excavator (2015+) or any Ryzen are supported. Older CPUs need to build from source with -C target-cpu=native.

Two AUR packages: voxtype-bin (prebuilt binaries, install in seconds) and voxtype (builds from source via cargo, 20+ minutes). Most users want voxtype-bin.

# Prebuilt binaries (recommended)
paru -S voxtype-bin       # or: yay -S voxtype-bin

# Or build from source
paru -S voxtype           # or: yay -S voxtype

# Recommended optional dependencies
sudo pacman -S wtype wl-clipboard libnotify gtk4-layer-shell

Next: First-run setup below.

Requires Ubuntu 24.04+ or Debian Trixie+ (glibc 2.39). Older versions can build from source.

# Download and install
wget https://github.com/peteonrails/voxtype/releases/download/v0.7.2/voxtype_0.7.2-1_amd64.deb
sudo apt install ./voxtype_0.7.2-1_amd64.deb

# Recommended optional dependencies
sudo apt install wtype wl-clipboard libnotify-bin pipewire-alsa playerctl

Next: First-run setup below.

Requires Fedora 40+ (glibc 2.39).

# Download and install
wget https://github.com/peteonrails/voxtype/releases/download/v0.7.2/voxtype-0.7.2-1.x86_64.rpm
sudo dnf install ./voxtype-0.7.2-1.x86_64.rpm

# Recommended optional dependencies
sudo dnf install wtype wl-clipboard libnotify pipewire-alsa playerctl

Next: First-run setup below.

Apple Silicon (arm64) only. Uses Microsoft ONNX Runtime so all engines are available, including Parakeet on the Neural Engine path.

# Install the signed app bundle
brew install --cask voxtype

# First launch opens a setup wizard that walks you through
# Accessibility permissions, model download, and the FN-key hotkey.

Or download voxtype-0.7.2-macOS-arm64.dmg from the latest release.

NixOS users get a flake with packages for every binary variant. Pulls prebuilt ONNX Runtime; CPU-only variant builds entirely from source.

# Imperative install
nix profile install github:peteonrails/voxtype/v0.7.2#vulkan

# Available outputs: default, vulkan, cuda, rocm, osdGtk4, osdNative
nix build github:peteonrails/voxtype/v0.7.2#osdGtk4

# Or pin in your flake inputs
inputs.voxtype.url = "github:peteonrails/voxtype/v0.7.2";

Next: First-run setup below.

Self-contained binary for distros that don't have a packaged voxtype. Three variants: CPU, Vulkan GPU, and ONNX engines.

# Download a variant from the latest release
wget https://github.com/peteonrails/voxtype/releases/download/v0.7.2/voxtype-0.7.2-x86_64.AppImage
chmod +x voxtype-0.7.2-x86_64.AppImage

# Run directly, or move to ~/.local/bin/ for permanent install
./voxtype-0.7.2-x86_64.AppImage --help
mv voxtype-0.7.2-x86_64.AppImage ~/.local/bin/voxtype

Next: First-run setup below.

# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install build dependencies
#   Fedora:
sudo dnf install rust cargo alsa-lib-devel clang-devel cmake pkgconf
#   Arch:
sudo pacman -S rustup alsa-lib clang cmake pkgconf
#   Debian/Ubuntu:
sudo apt install cargo libasound2-dev libclang-dev cmake pkg-config

# Clone and build
git clone https://github.com/peteonrails/voxtype
cd voxtype
cargo build --release

# Install
sudo cp target/release/voxtype /usr/local/bin/

For GPU support: --features gpu-vulkan, --features gpu-cuda, or --features parakeet-migraphx. See docs/INSTALL.md for the full feature matrix.

Next: First-run setup below.

GPU acceleration. Every Linux package ships a Vulkan Whisper binary plus per-vendor ONNX engine binaries (CUDA 12, CUDA 13, MIGraphX for AMD). After install, point the wrapper at the right one:

# Auto-detect GPU and install runtime, then switch the wrapper
sudo voxtype setup gpu --enable

Runtime packages you may need: vulkan-icd-loader (Vulkan), cuda or cuda12.6 (NVIDIA), rocm-hip-runtime 7.x (AMD). Cohere, Parakeet, Moonshine, SenseVoice, Paraformer, Dolphin, and Omnilingual all run on whichever GPU EP the wrapper resolves to.

First-run setup

For Hyprland, Sway, and River users. Uses native compositor keybindings—no input group required!

# Download whisper model and configure
voxtype setup --download

# Disable built-in hotkey (we'll use compositor keybindings)
cat >> ~/.config/voxtype/config.toml << 'EOF'

[hotkey]
enabled = false
EOF

# Enable state file (required for toggle mode)
echo 'state_file = "auto"' >> ~/.config/voxtype/config.toml

# Install as systemd service
voxtype setup systemd

# Optional: Fix modifier key interference (if using SUPER+key combos)
voxtype setup compositor hyprland  # or: sway

Then add keybindings to your compositor config:

# Push-to-talk: hold Super+V to record, release to transcribe
bind = SUPER, V, exec, voxtype record start
bindr = SUPER, V, exec, voxtype record stop

# Or use Scroll Lock
bind = , SCROLL_LOCK, exec, voxtype record start
bindr = , SCROLL_LOCK, exec, voxtype record stop

# Push-to-talk: hold $mod+v to record, release to transcribe
bindsym $mod+v exec voxtype record start
bindsym --release $mod+v exec voxtype record stop

# Push-to-talk: hold Super+V to record, release to transcribe
riverctl map normal Super V spawn 'voxtype record start'
riverctl map -release normal Super V spawn 'voxtype record stop'

For GNOME, KDE, and other desktops. Uses kernel-level hotkey detection.

# Add user to input group (required for hotkey detection)
sudo usermod -aG input $USER
# Log out and back in for group change to take effect

# Download whisper model and configure
voxtype setup --download

# Install as systemd service (starts on login)
voxtype setup systemd

# Check status
systemctl --user status voxtype

Works Everywhere

Tested on all major Linux desktops. Optimized for Wayland, works on X11 too.

GNOME GNOME

KDE Plasma

Sway Sway

Hyprland

River River

Any Wayland

We Want to Hear From You

Voxtype is a young project and your feedback helps make it better

Something Not Working?

If Voxtype doesn't install cleanly, doesn't work on your system, or is buggy in any way, please open an issue. I actively monitor and respond to all reports.

Report an Issue

Like Voxtype?

A star on GitHub helps others discover the project. A vote on the AUR package increases the likelihood of inclusion in the Arch extras repository.

Ready to try Voxtype?

Start dictating on your Linux desktop today.