hark π
100% offline voice notes from your terminal
Use Cases
- Voice-to-LLM pipelines β
hark | llmturns speech into AI prompts instantly - Meeting minutes β Transcribe calls with speaker identification (
--diarize) - System audio capture β Record what you hear, not just what you say (
--input speaker) - Private by design β No cloud, no API keys, no data leaves your machine
Features
- ποΈ Instant Recording - One keypress to capture your thoughts
- π Multi-Source Capture - Record microphone, system audio, or both simultaneously
- β¨ High-Accuracy Transcription - State-of-the-art speech recognition for crystal-clear text
- π£οΈ Speaker Diarization - Automatically identify and label who said what
- π Complete Privacy - 100% offline processing, your audio never leaves your device
- π Flexible Output - Export as plain text, markdown, or SRT subtitles
- π Multilingual Support - Transcribe in dozens of languages with automatic detection
- β‘ Blazing Fast - Hardware-accelerated processing for near real-time results
Installation
System Dependencies
Ubuntu/Debian:
sudo apt install portaudio19-dev
macOS:
Windows:
No system dependencies required. Audio libraries are bundled with the Python packages.
Quick Start
# Record and print to stdout hark # Save to file hark notes.txt # Use larger model for better accuracy hark --model large-v3 meeting.md # Transcribe in German hark --lang de notes.txt # Output as SRT subtitles hark --format srt captions.srt # Capture system audio (e.g., online meetings) hark --input speaker meeting.txt # Capture both microphone and system audio (stereo: L=mic, R=speaker) hark --input both conversation.txt
Configuration
Hark uses a YAML config file at ~/.config/hark/config.yaml. CLI flags override config file settings.
# ~/.config/hark/config.yaml recording: sample_rate: 16000 channels: 1 # Use 2 for --input both max_duration: 600 input_source: mic # mic, speaker, or both whisper: model: base # tiny, base, small, medium, large, large-v2, large-v3 language: auto # auto, en, de, fr, es, ... device: auto # auto, cpu, cuda preprocessing: noise_reduction: enabled: true strength: 0.5 # 0.0-1.0 normalization: enabled: true silence_trimming: enabled: true output: format: plain # plain, markdown, srt timestamps: false diarization: hf_token: null # HuggingFace token (required for --diarize) local_speaker_name: null # Your name in stereo mode, or null for SPEAKER_00
Audio Input Sources
Hark supports three input modes via --input or recording.input_source:
| Mode | Description |
|---|---|
mic |
Microphone only (default) |
speaker |
System audio only (loopback capture) |
both |
Microphone + system audio as stereo (L=mic, R=speaker) |
System Audio Capture
System audio capture (--input speaker or --input both) works differently on each platform:
Linux (PulseAudio/PipeWire):
Uses monitor sources automatically. To verify your system supports it:
pactl list sources | grep -i monitorYou should see output like:
Name: alsa_output.pci-0000_00_1f.3.analog-stereo.monitor
Description: Monitor of Built-in Audio
macOS:
Requires BlackHole virtual audio driver:
-
Install BlackHole:
brew install blackhole-2ch
-
Open Audio MIDI Setup (in Applications β Utilities)
-
Click + β Create Multi-Output Device
-
Check both your speakers/headphones AND BlackHole 2ch
-
Set the Multi-Output Device as your default output in System Preferences β Sound
Now hark can capture system audio through BlackHole.
Windows 10/11:
Uses WASAPI loopback automatically. No setup requiredβjust ensure your audio output device is working.
Speaker Diarization
Identify who said what in multi-speaker recordings using WhisperX.
Setup
-
Install diarization dependencies:
pipx inject hark-cli whisperx # Or with pip: pip install hark-cli[diarization] -
Get a HuggingFace token (required for pyannote models):
- Create account at https://huggingface.co
- Accept model licenses:
- Create token at https://huggingface.co/settings/tokens
-
Add token to config:
# ~/.config/hark/config.yaml diarization: hf_token: "hf_xxxxxxxxxxxxx"
Usage
The --diarize flag enables speaker identification. It requires --input speaker or --input both.
# Transcribe a meeting with speaker identification hark --diarize --input speaker meeting.txt # Specify expected number of speakers (improves accuracy) hark --diarize --speakers 3 --input speaker meeting.md # Skip interactive speaker naming for batch processing hark --diarize --no-interactive --input speaker meeting.txt # Stereo mode: separate local user from remote speakers hark --diarize --input both conversation.md # Combine with other options hark --diarize --input speaker --format markdown --model large-v3 meeting.md
| Flag | Description |
|---|---|
--diarize |
Enable speaker identification |
--speakers N |
Hint for expected speaker count (improves clustering) |
--no-interactive |
Skip post-transcription speaker naming prompt |
Note: Diarization adds processing time. For a 5-minute recording, expect ~1-2 minutes on GPU or ~5-10 minutes on CPU.
Output Format
With diarization enabled, output includes speaker labels and timestamps:
Plain text:
[00:02] [SPEAKER_01] Hello everyone, let's get started.
[00:05] [SPEAKER_02] Thanks for joining. Let me share my screen.
Markdown:
# Meeting Transcript **SPEAKER_01** (00:02) Hello everyone, let's get started. **SPEAKER_02** (00:05) Thanks for joining. Let me share my screen. --- _2 speakers detected β’ Duration: 5:23 β’ Language: en (98% confidence)_
Interactive Naming
After transcription, hark will prompt you to identify speakers:
Detected 2 speaker(s) to identify.
SPEAKER_01 said: "Hello everyone, let's get started."
Who is this? [name/skip/done]: Alice
SPEAKER_02 said: "Thanks for joining. Let me share my screen."
Who is this? [name/skip/done]: Bob
Use --no-interactive to skip this prompt.
Known Issues
Slow diarization? The pyannote models may default to CPU inference. For GPU acceleration:
pip install --force-reinstall onnxruntime-gpu
See WhisperX #499 for details.
Development
git clone https://github.com/FPurchess/hark.git cd hark uv sync --extra test uv run pre-commit install uv run pytest
Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
Distributed under the AGPLv3 License.
Acknowledgments
This project would not exist without the hard work of others, first and foremost the maintainers and contributors of the below mentioned projects: