Show HN: Local audio transcription and speaker ID for Apple Silicon
github.com Built a tool combining MLX Whisper + pyannote for fast local audio transcription with speaker diarization on Apple Silicon.
Key benefits: privacy-first (fully local), hardware-accelerated, automatic speaker identification, multiple output formats (TXT/SRT/JSON).
Main technical challenge was making MLX Whisper and pyannote work together despite different audio processing - solved with preprocessing pipeline.
Perfect for interviews, meetings, podcasts. Handles HuggingFace gated models with proper error handling. Surprised this didn't get more traction, as it's really interesting.
Is there a reason it's ASi-only? I don't know the technical details of MLX, whether it runs or can be run on other hardware, etc.
Also, why does the HF token need to be in an environment variable and passed on the command line?