GitHub - jensneuse/voxcode

Voxcode - Local speech-to-text for coding agents with line of code reference

Voxcode combines local speech-to-text with ripgrep-style code search. Select code in a file, speak instructions, paste the transcript with a reference to the selected code's file and line numbers.

Example of what gets pasted:

src-tauri/src/ort_init.rs#L49-57

Add an env var ORT_INTRA_THREADS to make the thread count configurable, default to 10

Tip

Works with any IDE (VS Code, JetBrains, Zed, Xcode, Neovim, Emacs) and any AI coding tool (Claude Code, Cursor, Copilot, Codex, OpenCode, Windsurf, Cline, Roo Code, Kilo Code, Gemini CLI, Goose, Amp, Aider — anything that accepts text input).

Quickstart

git clone https://github.com/jensneuse/voxcode.git
cd voxcode
make run

How It Works

Select code in your editor
Press Right Command — Voxcode starts recording and captures which file and lines you selected
Speak your instructions
Press Right Command again — Voxcode transcribes your speech locally and pastes the result at your cursor

What gets pasted is your spoken instruction with the code context attached, so the AI agent knows exactly what code you're referring to:

src-tauri/src/ort_init.rs#L49-57

Add an env var ORT_INTRA_THREADS to make the thread count configurable, default to 10

If the selected code can't be pinpointed to exact lines, Voxcode falls back to pasting the selected text with the transcript:

```rust
let thread_pool = ort::environment::GlobalThreadPoolOptions::default()
    .with_intra_threads(intra_threads)
    .map_err(|e| format!("Failed to configure thread pool: {e}"))?;
```

Refactor this to read the thread count from a config file instead of an env var

Why I Built This

My name is Jens, I'm the CEO of WunderGraph. At our company, you can't vibe code. Like many others, we're heavy users of agentic coding tools, but we carefully review generated code and interact a lot with the agents.

Our company provides GraphQL Federation infrastructure for companies like eBay, SoundCloud, Paramount and others. Federation is a powerful architectural pattern that lets you compose multiple APIs into a single unified graph, solving two common problems I see companies hit constantly:

API/BFF sprawl - many backends by different teams, many apps, BFFs in the middle, and no clear ownership or visibility. Things break on every change and coding agents make it worse.
Agentic API integration - AI agents that waste minutes and hundreds of thousands of tokens exploring internal API landscapes

Federation solves both by letting teams contribute to a unified Supergraph. Before publishing an API change, you can always validate if you're about to break an existing contract. Agents can generate queries against the Supergraph in seconds, with up to 99% fewer tokens, independently of the scale of the underlying architecture.

If the problems resonate with you, check out WunderGraph.

Although we're 30+ people, I'm still involved in product and engineering, working across a distributed codebase, many repos, many IDEs. I use GoLand for Go, VS Code for prose, and Superset for agentic coding workspaces.

My biggest bottleneck when working with coding agents like Claude Code or Codex was always the friction of telling the agent exactly where in the codebase I want it to make changes. If I want to have a function refactored or the structure of a test changed, I can manually tell the model about the file and the line numbers, or maybe tell a file name and a function name, or simply copy-paste the relevant code snippet. In case of the latter, I'm wasting tokens and the model has to search for the relevant code in the file anyway, even though I already know exactly where it is and what it's called. Some tools like Claude have plugins that can understand what file you have open and which lines you have selected, but that only works in some IDEs and you can't queue up multiple instructions with this approach.

I found superwhisper which worked great for speech-to-text, but it didn't solve the code selection problem and I had to pay for a subscription when the tool actually runs locally with an open-source model.

So I built Voxcode. It's deliberately dumb. It assumes nothing about your coding environment or your AI tool. All it does is voice-to-text and ripgrep-style file search. It works, it's fast, and it helps me move faster when working with coding agents.

If you're using a similar workflow you might find it useful too.

How It Works Under the Hood

Voxcode doesn't integrate with any IDE or coding tool directly. It operates on files — reading what you selected, finding where it lives on disk, and pasting the result wherever your cursor is.

Repo indexing

On startup, Voxcode scans your home directory (depth 5) for git repositories on a background thread. Each repo's file count is recorded (respecting .gitignore), and the index is sorted lightest-first. A filesystem watcher (notify crate) picks up new repos automatically.

Text capture

When you press the hotkey, macOS Accessibility APIs read the selected text and line numbers from the active editor window. If the Accessibility API doesn't return selected text (some apps don't support it), Voxcode falls back to simulating Cmd+C.

File search - How we made it fast

The naive approach would be to spawn ripgrep once per repo. With 200 repos, that means 200 process spawns at ~25ms each. Even with 16 parallel workers, a cold search took ~9 seconds.

The solution: eliminate process spawns entirely. All repo roots are fed into a single parallel file walker (ignore crate, from the ripgrep ecosystem). It walks files across all repos simultaneously, respecting every .gitignore. Each file is checked with memchr::memmem — the same SIMD-accelerated string matching algorithm ripgrep uses internally. On first match, all walker threads exit immediately.

Result: the same search across 200 repos dropped from 9 seconds to under 1 second (10x). Repos that produce hits are promoted to the front of the index, so repeat searches in the same project are near-instant (<10ms)

Probe strategy

Up to 3 non-trivial lines are picked from the selected text (first, middle, last). A file must contain ALL probe lines to count as a match. This eliminates false positives from common code patterns like imports or braces.

Transcription

The Parakeet TDT model runs via ONNX Runtime, fully local — no audio leaves your machine. Transcription runs on a background thread while you're still speaking.

Paste

The resolved file#Lstart-end reference + your transcript are written to the clipboard, Voxcode switches back to the previous app, and pastes.

Prerequisites

macOS 13.0+
Rust (stable)
Node.js (for the Tauri CLI)
ONNX Runtime — install via Homebrew: brew install onnxruntime
Accessibility permission (System Settings > Privacy & Security > Accessibility)
Microphone permission

Setup

Clone the repository

git clone https://github.com/jensneuse/voxcode.git
cd voxcode

Install dependencies and download models

This installs Node dependencies (Tauri CLI), copies the ONNX Runtime dylib from your Homebrew installation, and downloads the Parakeet TDT speech-to-text model from HuggingFace. Subsequent runs are no-ops if everything is already in place.
Run in development mode

On first run, macOS will prompt for Accessibility and Microphone permissions.

Make Commands

Command	Description
`make`	Install all dependencies and download models
`make dev`	Run the app in development mode with hot reload
`make build`	Build a release `.app` bundle and `.dmg`
`make run`	Build for release and open the app

Environment Variables

Variable	Description
`ORT_DYLIB_PATH`	Override the ONNX Runtime dylib location. By default, the app looks in `src-tauri/resources/libonnxruntime.dylib`.
`ORT_INTRA_THREADS`	Number of intra-op threads for ONNX Runtime inference. Defaults to `10`, which is ~9% faster than the default (16) on Apple Silicon.
`SKIP_REAL_ORT_INIT`	Set to any value to skip ONNX Runtime initialization. Useful for running tests that don't need the inference engine.
`RUST_LOG`	Controls log verbosity. Defaults to `info`. Example: `RUST_LOG=debug make dev`

Project Structure

dist/                         Frontend (HTML/CSS/JS) served by Tauri
src-tauri/
  src/
    lib.rs                    App entry point, tray menu, recording state machine
    main.rs                   Binary entry point
    audio.rs                  Audio capture via cpal, resampling
    config.rs                 User configuration
    editor_context.rs         Captures active editor file/selection for context
    error.rs                  Error types
    hotkey.rs                 Global hotkey listener (CGEventTap)
    ort_init.rs               ONNX Runtime initialization
    parakeet.rs               Parakeet TDT model loading and transcription
    parakeet_longform.rs      Long-form audio transcription support
    paste.rs                  Clipboard paste
    repo_index.rs             Git repository indexer and parallel file search
    state.rs                  Recording state machine
    window_ext.rs             macOS window management helpers
  tests/                      Integration tests
  benches/                    Performance benchmarks
  resources/                  Models and dylibs (gitignored, created by setup script)
scripts/
  download-models.sh          Downloads models and ONNX Runtime dylib

Multi-Platform

Voxcode is macOS-only. See MULTIPLATFORM.md for a porting guide if you'd like to contribute Windows or Linux support.

License

MIT

Built by Jens Neuse at WunderGraph