The best price-to-performance of any large cloud provider
We are passing efficiency gains directly to customers: MAI-Transcribe-1 is priced at $0.36 per hour of audio, setting the standard for quality, speed, and price for production ASR.
Powering Microsoft Products
MAI-Transcribe-1 is in phased rollouts with Copilot’s Voice mode and Microsoft Teams to provide accurate conversation transcripts, that can be used for various downstream tasks.
Build with MAI-Transcribe-1
MAI-Transcribe-1 is now in public preview on Microsoft Foundry.
You can also experience MAI-Transcribe-1 in the newly launched Microsoft AI Playground.
MAI-Transcribe-1 delivers latency low enough for a wide range of use cases while providing very high accuracy.
Offline applications
MAI-Transcribe-1 supports a wide range of applications, from media and content tasks such as subtitle generation, podcast transcription, and video accessibility, to enterprise needs such as meeting archives, compliance recording, and legal discovery. It can also power analytics workflows, including call center QA, customer insight extraction, and searchable audio libraries, as well as large scale data pipelines for processing audio archives used in ML training, search indexing, and summarization.
Online applications
Low latency also makes MAI-Transcribe-1 a good choice for real-time tasks. Be it meeting transcription, video close captioning, or dictation.
Voice Agents: The complete stack
If you’re building a voice agent, MAI-Transcribe-1 is the foundational layer. Accurate transcription is what allows underlying LLMs to interpret intent effectively. It directly shapes user satisfaction and task completion rates.
By combining MAI-Transcribe-1 (speech-to-text) with MAI-Voice-1 (text-to-speech) and your chosen LLM you can build a robust solution to power voice experiences.