Show HN: TTSLab – A voice AI agent and TTS lab running in the browser via WebGPU

5 points by MbBrainz 11 days ago · 3 comments · 2 min read

Reader

I built TTSLab — a free, open-source tool for running text-to-speech and speech-to-text models directly in the browser using WebGPU and WASM.

No API keys, no backend, no data leaves your machine.

When you open the site, you'll hear it immediately — the landing page auto-generates speech from three different sentences right in your browser, no setup required.

You can then try any model yourself: type text, hit generate, hear it instantly. Models download once and get cached locally.

The most experimental feature: a fully in-browser Voice Agent. It chains speech-to-text → LLM → text-to-speech, all running locally on your GPU via WebGPU. You can have a spoken conversation with an AI without a single network request.

Currently supported models: - TTS: Kokoro 82M, SpeechT5, Piper (VITS) - STT: Whisper Tiny, Whisper Base

Other features: - Side-by-side model comparison - Speed benchmarking on your hardware - Streaming generation for supported models

Source: https://github.com/MbBrainz/ttslab (MIT)

Feedback I'd especially like: 1. How does performance feel on your hardware? 2. What models should I add next? 3. Did the Voice Agent work for you? That's the most experimental part.

Built on top of ONNX Runtime Web (https://onnxruntime.ai) and Transformers.js — huge thanks to those communities for making in-browser ML inference possible.

MbBrainzOP 11 days ago

Maker here. A few technical notes that might be interesting to this crowd:

The Voice Agent chains three models in the browser: Whisper for STT → a local LLM → Kokoro/SpeechT5 for TTS. All inference runs on-device via WebGPU. The latency isn't amazing yet, but the fact that it works at all with zero backend is kind of wild.

The landing page has an auto-playing demo that generates speech locally as soon as you visit — you'll hear it typewrite and speak three sentences. That was important to me because "runs in your browser" sounds like marketing until you actually hear it happen.

Happy to go deep on the WebGPU inference pipeline, model conversion process, or anything else.

nshelia 10 days ago

Looks really dope. Does it use VAD like Silero locally as well?

Performance is really good on my M2 Pro. Model download still takes time on my fiber optic but it's fine.

MbBrainzOP 10 days ago

Amazing, Im happy that you like it! The voice agent indeed uses Silero VAD(v5) - There is an onnx wasm file available for the underlying VAD model, so we can perfectly run that with the onnxruntime-web, just like we run the TTS and STT models!

Settings

Show HN: TTSLab – A voice AI agent and TTS lab running in the browser via WebGPU

Keyboard Shortcuts