State of the Art Speech Recognition with MAI-Transcribe-1

2 min read Original article ↗

The best price-to-performance of any large cloud provider

We are passing efficiency gains directly to customers: MAI-Transcribe-1 is priced at $0.36 per hour of audio, setting the standard for quality, speed, and price for production ASR.

Powering Microsoft Products

MAI-Transcribe-1 is in phased rollouts with Copilot’s Voice mode and Microsoft Teams to provide accurate conversation transcripts, that can be used for various downstream tasks.

Build with MAI-Transcribe-1

MAI-Transcribe-1 is now in public preview on Microsoft Foundry.

You can also experience MAI-Transcribe-1 in the newly launched Microsoft AI Playground.

MAI-Transcribe-1 delivers latency low enough for a wide range of use cases while providing very high accuracy.

Offline applications

MAI-Transcribe-1 supports a wide range of applications, from media and content tasks such as subtitle generation, podcast transcription, and video accessibility, to enterprise needs such as meeting archives, compliance recording, and legal discovery. It can also power analytics workflows, including call center QA, customer insight extraction, and searchable audio libraries, as well as large scale data pipelines for processing audio archives used in ML training, search indexing, and summarization.

Online applications

Low latency also makes MAI-Transcribe-1 a good choice for real-time tasks. Be it meeting transcription, video close captioning, or dictation.

Voice Agents: The complete stack

If you’re building a voice agent, MAI-Transcribe-1 is the foundational layer. Accurate transcription is what allows underlying LLMs to interpret intent effectively. It directly shapes user satisfaction and task completion rates.

By combining MAI-Transcribe-1 (speech-to-text) with MAI-Voice-1 (text-to-speech) and your chosen LLM you can build a robust solution to power voice experiences.

Model Card

Download Model Card