Open ASR Leaderboard
This repository contains the code for the Open ASR Leaderboard. The leaderboard is a Gradio Space that allows users to compare the accuracy of ASR models on a variety of datasets. The leaderboard is hosted at hf-audio/open_asr_leaderboard.
Datasets
The Open ASR Leaderboard evaluates models on a diverse set of publicly available ASR benchmarks hosted on the Hugging Face Hub. These datasets cover a wide range of domains, languages, and recording conditions to provide a fair and comprehensive comparison across models.
-
Main Test Sets (English, short-form): The main benchmark datasets used for evaluation (short-form English) are available here.
-
English, long-form: The ASR Longform benchmark dataset includes earnings21 and earnings22. We also evaluate on CORAAL, but it is stored as a separate dataset since it has multiple splits.
-
Multilingual Benchmark: The ASR Multilingual benchmark dataset includes fleurs, mcv and mls multilingual.
-
Private datasets: After submitting a model to the leaderboard, the maintainers will evaluate on private sets, as described here.
Evaluate a model (as of 24 June 2026)
English short-form evaluations use Hugging Face Jobs to guarantee reproducibility: every run executes a Docker image on the same hardware, to minimize environment and driver differences. Multilingual and long-form evaluations will migrate to HF Jobs in the future.
Jobs are launched on the following hardware (flavor in HF Jobs terminology):
name pretty name cpu ram storage accelerator cost/min cost/hour
h200 Nvidia H200 23 vCPU 256 GB 3000 GB 1x H200 (141 GB) $0.0833 $5.00
Example costs for a full run over the main public datasets:
- $2.92 for
nvidia/parakeet-tdt-0.6b-v3 - $4.75 for
openai/whisper-large-v3-turbo - $5.58 for
Qwen/Qwen3-ASR-1.7B
Each model family has its own Docker image with the necessaru software requirements. The evalulation configurations are hosted as HF Spaces.
To launch an evaluation:
-
Hugging Face Hub setup
- Create an account at https://huggingface.co/ and add credits for HF Jobs: https://huggingface.co/settings/billing
- Create a WRITE token and copy it.
- Create a Storage Bucket to store results: https://huggingface.co/new-bucket
-
One-time local setup
A local setup is needed to launch the evaluation and score with the repo's normalizer.
# Clone the repository git clone git@github.com:huggingface/open_asr_leaderboard.git cd open_asr_leaderboard # Create a minimal conda environment (no GPU required locally) conda create -n leaderboard_jobs python=3.10 -y conda activate leaderboard_jobs pip install -r requirements/requirements_jobs.txt huggingface-cli login # paste your WRITE token when prompted
- Launch an evaluation 🚀
# Open the relevant submit_jobs script, uncomment the models/datasets you want, then run: RESULTS_BUCKET="<your-bucket>" HF_TOKEN=hf_... bash qwen/submit_jobs.sh # Jobs are submitted in parallel (one per dataset). The script waits for all # jobs to finish, syncs results from the bucket, and prints a CSV summary. # Billing to org ORG_NAME="<org-name>" RESULTS_BUCKET="<your-bucket>" HF_TOKEN=hf_... bash qwen/submit_jobs.sh
Local evaluation
For contributors who want to test locally or evaluate multilingual/long-form models before HF Jobs support is added, the requirements/ folder contains per-family dependency files. The Dockerfiles in the HF Spaces can also be used to build a local container.
Each model family has a run_eval.py entry point driven by a corresponding bash script (e.g. run_whisper.sh). The script outputs a JSONL file with predictions and prints WER and RTFx after completion. See the sub-folders of this repo for examples; the latest scripts are in the HF Spaces linked above.
Trade-off plots
For open-source models, you can plot tradeoff plots like below with scripts/plot_all.sh.
You can highlight a particular model (see scripts/data for CSV results as of 26 March 2026):
./scripts/plot_all.sh --highlight "model_name"
# for example
./scripts/plot_all.sh --highlight "nvidia/parakeet-tdt-0.6b-v3"
You can also specify your own model and its performance as such:
./scripts/plot_all.sh --custom-model "MY MODEL" --model-size 2.0 --en-shortform-wer 5.5 --en-shortform-rtfx 1000
Contributing a model or dataset
Please follow the pull request template; it contains a submission checklist and guidelines.
Citation
@misc{srivastav2025openasrleaderboardreproducible, title={Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation}, author={Vaibhav Srivastav and Steven Zheng and Eric Bezzam and Eustache Le Bihan and Nithin Koluguri and Piotr Żelasko and Somshubra Majumdar and Adel Moumen and Sanchit Gandhi}, year={2025}, eprint={2510.06961}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2510.06961}, }


