GitHub - abhijitxy/WsprFlowPy

WsprFlowPy

An open-source, high-performance dictation tool that uses AI to format speech into clean, usable text, inspired by Wisprflow.

System Requirements

OS: Windows 10/11 (uses Windows APIs for window detection and keyboard input)
GPU: NVIDIA GPU with CUDA support (recommended for best performance)
- CPU mode available but significantly slower
RAM: 4GB+ (8GB+ recommended for larger Whisper models)
Microphone: Any input device (USB microphones like HyperX QuadCast work great)

Prerequisites

1. Python 3.9+

Download and install Python from python.org

2. NVIDIA GPU Setup (For GPU Acceleration)

Install NVIDIA Driver

Download the latest driver from NVIDIA's website
Select your GPU model and install

Install CUDA Toolkit 12.4

Download CUDA Toolkit 12.4
Run the installer and select:
- CUDA Toolkit
- CUDA Development
- CUDA Runtime
Default installation path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4

Install cuDNN 9.x

Download cuDNN 9.17 for CUDA 12.x
Extract the archive

Copy files to CUDA installation:

cuDNN/bin/*.dll → C:\Program Files\NVIDIA\CUDNN\v9.17\bin\12.9\
cuDNN/include/*.h → C:\Program Files\NVIDIA\CUDNN\v9.17\include\
cuDNN/lib/*.lib → C:\Program Files\NVIDIA\CUDNN\v9.17\lib\

Verify CUDA Installation

nvcc --version
nvidia-smi

Note: If you don't have an NVIDIA GPU, you can still use CPU mode by changing WHISPER_DEVICE = "cpu" in wsprv2.py (line 30).

3. FFmpeg (Optional but Recommended)

While not strictly required for this project, FFmpeg can improve audio compatibility:

Download from ffmpeg.org
Extract to a folder (e.g., C:\ffmpeg)
Add to PATH: C:\ffmpeg\bin

Installation

1. Clone the Repository

git clone https://github.com/yourusername/wsprflowpy.git
cd wsprflowpy

2. Create Virtual Environment (Recommended)

python -m venv venv
venv\Scripts\activate

3. Install Python Dependencies

pip install -r requirements.txt

4. Configure Environment Variables

Copy the example environment file:

Edit .env and add your API keys:

# Optional: Hugging Face token for downloading models
HF_TOKEN=your_huggingface_token_here

# Required: OpenRouter API key for Claude formatting
OPENROUTER_API_KEY=your_openrouter_api_key_here

Getting API Keys

OpenRouter: Sign up at openrouter.ai and create an API key
- Used for AI-powered transcript formatting with Claude
- Pay-per-use pricing (very affordable for personal use)
Hugging Face (optional): Sign up at huggingface.co
- Only needed if you encounter model download issues

5. Verify CUDA Paths

The application automatically adds CUDA to PATH (lines 22-23 in wsprv2.py). Verify these paths match your installation:

os.environ['PATH'] = r'C:\Program Files\NVIDIA\CUDNN\v9.17\bin\12.9' + os.pathsep + os.environ['PATH']
os.environ['PATH'] = r'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin' + os.pathsep + os.environ['PATH']

Adjust if your CUDA/cuDNN is installed elsewhere.

Usage

First Run

Start the application:
Microphone Selection: On first run, you'll be prompted to select your microphone:
```
Available input devices:
[0] Microsoft Sound Mapper - Input
[1] HyperX QuadCast S (2- HyperX)
[2] Microphone Array (Realtek)
...
Select input device index: 1
```
- Your selection is saved to mic_config.json and remembered for future runs
- The app will auto-suggest preferred mics (HyperX, QuadCast, etc.)

Wait for initialization:

--- Initializing Clients ---
Loading Whisper model...
Whisper model loaded in 2.34s.
OpenRouter client initialized.

You're ready! The status badge will appear at the bottom center of your screen.

Controls

Ctrl + Alt (Hold): Start recording for dictation
- Hold both keys and speak
- Release to stop recording
- Transcribed and formatted text is automatically pasted

Status Badge

The minimal UI badge shows current status:

Thin line: Idle, ready to record
Red waveform: Recording in progress
Pulsing dots: Processing audio

Configuration

Model Settings

Edit wsprv2.py to customize:

# Whisper STT Configuration (line 29-32)
WHISPER_MODEL_NAME = "small.en"      # tiny.en, base.en, small.en, medium.en, large-v3
WHISPER_DEVICE = "cuda"              # "cuda" or "cpu"
WHISPER_COMPUTE_TYPE = "float16"     # float16 (GPU), int8 (CPU/low-end GPU)
WHISPER_BEAM_SIZE = 5                # Higher = more accurate but slower

# LLM for formatting (line 35)
MODEL_FORMATTER = "anthropic/claude-haiku-4.5"  # Fast and cost-effective

Model Size vs Performance

Model	Size	Speed (GPU)	Accuracy	Use Case
tiny.en	39MB	Fastest	Good	Quick notes, drafts
base.en	74MB	Very fast	Better	General use
small.en	244MB	Fast	Great	Recommended
medium.en	769MB	Moderate	Excellent	High accuracy needed
large-v3	1.5GB	Slower	Best	Maximum accuracy

Microphone Settings

Saved config: mic_config.json stores your microphone selection
Reset config: Delete mic_config.json to reselect your microphone
Preferred mics: Edit MIC_PREFERRED_KEYWORDS in wsprv2.py (line 72)

Sound Effects

Place custom sound files in the sounds/ directory:

dictation-start.wav: Recording started
dictation-stop.wav: Recording stopped
Notification.wav: Errors or short recordings

Troubleshooting

"CUDA not found" or GPU errors

Verify CUDA installation: nvcc --version
Check cuDNN files are in the correct location

As a fallback, switch to CPU mode:

WHISPER_DEVICE = "cpu"
WHISPER_COMPUTE_TYPE = "int8"

Microphone not working

Check Windows sound settings - ensure your mic is set as default recording device
Delete mic_config.json and restart to reselect
Try selecting "system default" option when prompted

Transcription quality issues

Use a better microphone (USB mics recommended)
Speak clearly and reduce background noise
Try a larger Whisper model: medium.en or large-v3
Increase beam size: WHISPER_BEAM_SIZE = 10

Slow transcription

Ensure you're using CUDA (WHISPER_DEVICE = "cuda")
Use a smaller model: tiny.en or base.en
Check GPU usage with nvidia-smi during transcription

Import errors

# Reinstall dependencies
pip install --upgrade -r requirements.txt

# If specific package fails, install individually:
pip install faster-whisper --upgrade

Keys not detected

Ensure the app has focus (click the status badge)
Try running as Administrator
Check if another app is intercepting the hotkeys

Building an Executable (Optional)

To create a standalone .exe:

pip install pyinstaller
pyinstaller --onefile --windowed --add-data "sounds;sounds" wsprv2.py

The executable will be in the dist/ folder.

Project Structure

wsprflowpy/
├── wsprv2.py              # Main application
├── requirements.txt       # Python dependencies
├── .env                   # Environment variables (create from .env.example)
├── .env.example          # Example environment file
├── mic_config.json       # Microphone settings (auto-generated)
├── sounds/               # Sound effect files
│   ├── dictation-start.wav
│   ├── dictation-stop.wav
│   └── Notification.wav
└── README.md            # This file

How It Works

Recording: Press Ctrl+Alt to start recording audio via sounddevice
Transcription: Audio is processed by faster-whisper (local, no network needed)
Formatting: Raw transcript is sent to Claude via OpenRouter for cleanup
Context: Active window is detected to apply appropriate formatting style
Pasting: Formatted text is copied to clipboard and pasted automatically

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details

Acknowledgments

Inspired by wisprflow.ai
Built with faster-whisper
Formatting powered by Anthropic's Claude

Support

If you encounter issues:

Check the Troubleshooting section
Open an issue on GitHub with:
- Error message
- Python version (python --version)
- CUDA version (nvcc --version)
- GPU model

Note: This is an unofficial remake and is not affiliated with Wisprflow.