Settings

Theme

Show HN: Reproducible open-source STT API benchmarks with full methodology

github.com

1 points by jilijeanlouis 13 days ago · 1 comment

Reader

jilijeanlouisOP 13 days ago

Author here. We built this because we kept seeing different word error rates (WER) for the same models depending on who was testing and how.

Normalization rules ended up being a big reason why this was happening, so we decided to release a fully reproducible evaluation framework. You can test it yourself with our full repo.

It includes: Normalization rules we use; Scoring scripts; Dataset coverage (conversational, noisy, multilingual); Full eval pipeline

We also published a detailed comparison using this framework across 8 leading STT providers, 7 datasets, and 74 hours of audio. You can see it here: https://www.gladia.io/competitors/benchmarks

Feedback welcomed!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection