Press a keybind, speak, and get instant text output. A speech-to-text tool that transcribes audio using OpenAI Whisper and outputs to stdout.
Features
- Signal-driven: Press keybind → speak → get text (no GUI needed)
- UNIX philosophy: Outputs transcribed text to stdout for piping to other tools
- On-demand operation: Starts when called, processes audio, then exits
- Audio feedback: Beeps confirm recording start/stop and success
- Wayland native: Works with modern Linux desktops (Hyprland, Niri, etc.)
- Optional local transcription: Run Whisper locally using whisper-rs
Requirements
- Wayland desktop (Hyprland, Niri, GNOME, KDE, etc.)
- OpenAI API key (for Whisper transcription)
- System packages:
# Arch Linux sudo pacman -S pipewire # Ubuntu/Debian sudo apt install pipewire-pulse # Fedora sudo dnf install pipewire-pulseaudio
Optional (for direct typing keybindings):
# Arch Linux sudo pacman -S ydotool # Ubuntu/Debian sudo apt install ydotool # Fedora sudo dnf install ydotool # Setup ydotool permissions and service: sudo usermod -a -G input $USER # Enable and start ydotool daemon service sudo systemctl enable --now ydotool.service # Set socket environment variable (add to ~/.bashrc or ~/.zshrc) echo 'export YDOTOOL_SOCKET=/tmp/.ydotool_socket' >> ~/.bashrc # Log out and back in (or source ~/.bashrc)
Installation
From AUR (Arch Linux)
# Using your preferred AUR helper yay -S waystt-bin # or paru -S waystt-bin
Download Binary
- Download from GitHub Releases
- Install:
wget https://github.com/sevos/waystt/releases/latest/download/waystt-linux-x86_64 mkdir -p ~/.local/bin mv waystt-linux-x86_64 ~/.local/bin/waystt chmod +x ~/.local/bin/waystt # Add to PATH (add to ~/.bashrc or ~/.zshrc) export PATH="$HOME/.local/bin:$PATH"
Quick Start
- Setup configuration:
# Create config directory and file mkdir -p ~/.config/waystt echo "OPENAI_API_KEY=your_api_key_here" > ~/.config/waystt/.env
- Test the application:
# Run waystt and pipe output to see it working waystt | tee /tmp/waystt-output.txt
- Use with signals:
# Transcribe and output to stdout
pkill --signal SIGUSR1 waysttQuick Reference
Common Commands
# Download local model and exit waystt --download-model # Start waystt and save output to file waystt > output.txt # Start waystt and copy output to clipboard waystt --pipe-to wl-copy # Start waystt and type output directly waystt --pipe-to ydotool type --file - # Trigger transcription (if waystt is running) pkill --signal SIGUSR1 waystt
Keybinding Pattern
Most keybindings follow this pattern:
pgrep -x waystt >/dev/null && pkill --signal SIGUSR1 waystt || (waystt [OPTIONS] &)
This means: "If waystt is running, send signal to transcribe. Otherwise, start waystt with specified options."
Keyboard Shortcuts Setup
Hyprland
Add to your ~/.config/hypr/hyprland.conf:
# waystt - Speech to Text (direct typing) bind = SUPER, R, exec, pgrep -x waystt >/dev/null && pkill --signal SIGUSR1 waystt || (waystt --pipe-to ydotool type --file - &) # waystt - Speech to Text (clipboard copy) bind = SUPER SHIFT, R, exec, pgrep -x waystt >/dev/null && pkill --signal SIGUSR1 waystt || (waystt --pipe-to wl-copy &)
Niri
Add to your ~/.config/niri/config.kdl:
binds { // waystt - Speech to Text (direct typing) Mod+R { spawn "sh" "-c" "pgrep -x waystt >/dev/null && pkill --signal SIGUSR1 waystt || (waystt --pipe-to ydotool type --file - &)"; } // waystt - Speech to Text (clipboard copy) Mod+Shift+R { spawn "sh" "-c" "pgrep -x waystt >/dev/null && pkill --signal SIGUSR1 waystt || (waystt --pipe-to wl-copy &)"; } }
Keybinding Functions:
- Super+R (Hyprland) / Mod+R (Niri): Direct typing via ydotool
- Super+Shift+R (Hyprland) / Mod+Shift+R (Niri): Copy to clipboard
Usage Examples
waystt starts on-demand, records audio, transcribes it, outputs to stdout, then exits:
Basic Usage (stdout)
# Terminal 1: Start waystt with output to file waystt > transcription.txt # Terminal 2: Trigger transcription (or use keyboard shortcut) pkill --signal SIGUSR1 waystt
Using --pipe-to Option
The --pipe-to option allows you to pipe transcribed text directly to another command:
# Copy transcription to clipboard waystt --pipe-to wl-copy pkill --signal SIGUSR1 waystt # Type transcription directly into focused window waystt --pipe-to ydotool type --file - pkill --signal SIGUSR1 waystt # Process transcription with sed and copy to clipboard waystt --pipe-to sh -c "sed 's/hello/hi/g' | wl-copy" pkill --signal SIGUSR1 waystt # Save to file with timestamp waystt --pipe-to sh -c "echo \"$(date): $(cat)\" >> speech-log.txt" pkill --signal SIGUSR1 waystt
Configuration
Configuration is read from ~/.config/waystt/.env by default. You can override this location using the --envfile flag:
waystt --envfile /path/to/custom/.env
waystt supports two transcription providers: OpenAI Whisper (default) and Google Speech-to-Text. Choose the one that best fits your needs.
OpenAI Whisper (Default)
OpenAI Whisper offers excellent accuracy and supports automatic language detection.
Required: Create ~/.config/waystt/.env with your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
Optional OpenAI settings:
# Whisper model (whisper-1 is default, most cost-effective) WHISPER_MODEL=whisper-1 # Force specific language (default: auto-detect) WHISPER_LANGUAGE=en # API timeout in seconds WHISPER_TIMEOUT_SECONDS=60 # Max retry attempts WHISPER_MAX_RETRIES=3
Google Speech-to-Text
Google Speech-to-Text provides fast, accurate transcription with support for many languages and dialects.
Setup Steps:
-
Enable Google Cloud Speech-to-Text API:
- Go to Google Cloud Console
- Create a new project or select existing one
- Enable the "Cloud Speech-to-Text API"
- Create a service account and download the JSON key file
-
Configure waystt for Google:
# Switch to Google provider TRANSCRIPTION_PROVIDER=google # Path to your service account JSON file GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service-account-key.json # Primary language (default: en-US) GOOGLE_SPEECH_LANGUAGE_CODE=en-US # Model selection (latest_long for longer audio, latest_short for shorter) GOOGLE_SPEECH_MODEL=latest_long # Optional: Alternative languages for auto-detection (comma-separated) GOOGLE_SPEECH_ALTERNATIVE_LANGUAGES=es-ES,fr-FR,de-DE
Local Whisper (whisper-rs)
Run transcription locally without sending audio to external APIs. Models are downloaded from Hugging Face in GGML format.
# Switch to local provider TRANSCRIPTION_PROVIDER=local # Model file name stored in ~/.local/share/applications/waystt/models/ WHISPER_MODEL=ggml-base.en.bin # Download the model and exit waystt --download-model
Available Models (GGML format):
ggml-tiny.bin- Fastest, least accurate (39 MB)ggml-tiny.en.bin- English-only tiny model (39 MB)ggml-base.bin- Small size, good performance (142 MB)ggml-base.en.bin- English-only base model (142 MB)ggml-small.bin- Better accuracy than base (466 MB)ggml-small.en.bin- English-only small model (466 MB)ggml-medium.bin- Good accuracy/speed balance (1.5 GB)ggml-medium.en.bin- English-only medium model (1.5 GB)ggml-large.bin- Best accuracy, slower (2.9 GB)ggml-large-v1.bin- Large model v1 (2.9 GB)ggml-large-v2.bin- Large model v2 (2.9 GB)ggml-large-v3.bin- Latest large model (2.9 GB)
Recommendations:
- For English only: Use
.en.binmodels for better performance - For speed:
ggml-tiny.en.binorggml-base.en.bin - For accuracy:
ggml-large-v3.binorggml-medium.en.bin - For balance:
ggml-base.en.bin(default)
If the configured model is missing, the application will exit with an error. OpenAI remains the default provider.
Popular Google language codes:
en-US- English (United States)en-GB- English (United Kingdom)es-ES- Spanish (Spain)fr-FR- French (France)de-DE- German (Germany)ja-JP- Japanesezh-CN- Chinese (Simplified)
General Settings
Audio and system settings (apply to both providers):
# Disable audio beeps ENABLE_AUDIO_FEEDBACK=false # Adjust beep volume (0.0 to 1.0) BEEP_VOLUME=0.1 # Debug logging RUST_LOG=debug
Troubleshooting
Audio Issues
If audio recording fails:
- Ensure PipeWire is running:
systemctl --user status pipewire - Check microphone permissions
- Verify microphone is not muted
API Issues
OpenAI Provider:
- Verify your OpenAI API key is valid and has sufficient credits
- Check internet connectivity
- Review logs for specific error messages
Google Provider:
- Verify your service account JSON file path is correct
- Ensure the Speech-to-Text API is enabled in your Google Cloud project
- Check that your service account has the necessary permissions
- Verify your Google Cloud project has billing enabled
- Review logs for specific error messages
Development
Running Tests
Running with Debug Output
# Using default config location (~/.config/waystt/.env) RUST_LOG=debug cargo run # Or using project-local .env file for development RUST_LOG=debug cargo run -- --envfile .env
Building from Source
git clone https://github.com/sevos/waystt.git cd waystt # Create config directory and copy example configuration mkdir -p ~/.config/waystt cp .env.example ~/.config/waystt/.env # Edit ~/.config/waystt/.env with your API key # Build the project cargo build --release # Install to local bin mkdir -p ~/.local/bin cp ./target/release/waystt ~/.local/bin/
License
Licensed under GPL v3.0 or later. Source code: https://github.com/sevos/waystt
See LICENSE for full terms.