MAI-Transcribe-1
MAI-Transcribe-1
Version: 2026-01-23
MicrosoftLast updated April 2026
MAI-Transcribe-1 is an ASR model built to deliver high quality batch transcription whenever the user speaks. It is designed to achieve high accuracy across 25 languages and to adapt seamlessly to diverse accents, dialects, and regional speech patterns.

MAI Audio Models

MAI‑Transcribe‑1 is a best-in-class speech‑to‑text model, designed for real‑world audio. It provides consistently strong transcription accuracy across accents, speaking styles, and noisy environments, giving developers a strong foundation for building high‑quality voice understanding into their applications.

Key capabilities

About this model

MAI‑Transcribe‑1 is a speech‑to‑text model built in‑house by the Microsoft AI Superintelligence team, designed to deliver reliable transcription across 25 languages. It powers a wide range of use cases, including video captions, meeting transcription, accessibility tools, call analysis, content creation workflows, and powering voice agents. The model is optimized to be robust across diverse accents, dialects, and real‑world acoustic conditions, giving developers a transcription system they can rely on. MAI‑Transcribe‑1 is actively under development, with new capabilities coming soon, including real‑time transcription, diarization and context biasing.

Key model capabilities

  • Best-in-class accuracy across 25 languages: English, French, German, Italian, Spanish, Hindi, Portuguese, Czech, Danish, Finnish, Hungarian, Dutch, Polish, Romanian, Swedish, Japanese, Korean, Chinese, Arabic, Indonesian, Russian, Thai, Turkish, and Vietnamese.
  • Robust to real-world noisy situations.
  • Automatic Language identification.

Use cases

See Responsible AI for additional consideration for responsible use.

Key use cases

Use caseScenarioSolution
Live captionsA virtual event platform provides real-time captions for webinars.Chunk audio and transcribe spoken content into captions displayed live during the event.
Call center transcriptionA call center wants accurate, fast transcriptions of customer calls to empower their customer service agents.Transcribe calls in real time, enabling agents to better understand and respond to customer queries.
Video subtitlingA video-hosting platform needs to generate subtitles for uploaded videos.Transcribe the full video audio to produce a complete subtitle track.
AccessibilityAn organization needs to make audio content accessible to deaf or hard-of-hearing users.Transcribe audio from meetings, announcements, or media to provide text alternatives that support compliance and inclusive access.
E-learningAn e-learning platform provides transcriptions for video lectures.Process prerecorded lecture videos, generating text transcripts for students.
Media archivingA media company needs subtitles for a large archive of videos.Transcribe video files in bulk, generating accurate subtitles for each video.
Market researchA research firm analyzes customer feedback from audio recordings.Convert audio feedback into text, enabling easier analysis and insights extraction.

Out of scope use cases

Real‑time transcription, diarization, and biasing aren't supported yet; these capabilities are planned for an upcoming release.

Pricing

Pay-As-You-Go & Commitment Tiers See pricing details here .

Technical specs

This information is not available.

Training cut-off date

This information is not available.

Input formats

LLM Speech: WAV, MP3, FLAC

Supported languages

English, French, German, Italian, Spanish, Hindi, Portuguese, Czech, Danish, Finnish, Hungarian, Dutch, Polish, Romanian, Swedish, Japanese, Korean, Chinese, Arabic, Indonesian, Russian, Thai, Turkish, and Vietnamese.

Supported Azure regions

Global access enabled, but for now the resources need to point to East US and West US. We’ll be scaling out to additional regions soon.

Sample JSON response

Please refer to the sample JSON for LLM Speech according to your usage.

Model architecture

Autoregressive model with text prediction

Optimizing model performance

Coming Soon...

Additional assets

This information is not available.

Distribution

You can deploy MAI-Transcribe-1 via Azure Speech in the cloud or on-premises. In some cases, you may not be able to use the Speech SDK. In those cases, you can use REST APIs to access the Speech service. For example, use REST APIs for LLM Speech .

More information

Learn more in the full Azure Speech Service documentation .

Responsible AI considerations

Safety techniques

Refer to the guidance for integration and responsible use with speech to text .

Safety evaluations

This information is not available.

Known limitations

MAI-Transcribe-1 recognizes what's spoken in an audio input, and then generates transcription outputs. This requires proper setup for the expected languages used in the audio input and spoken styles. Non-optimal settings might lead to lower accuracy. Refer to Technical limitations, operational factors, and ranges for more details.

Acceptable use

Acceptable use policy

The Speech to Text API powered by MAI-Transcribe-1 offers convenient options for developing voice-enabled applications, but it is very important to consider the context in which you will integrate the API. You must ensure that you comply with all laws and regulations that apply to your application. This includes understanding your obligations under privacy and communication laws, including national and regional privacy, eavesdropping, and wiretap laws that apply to your jurisdiction. Collect and process only audio that is within the reasonable expectations of your users. This includes ensuring that you have all necessary and appropriate consents from users for you to collect, process, and store their audio data. Refer to Technical limitations, operational factors, and ranges for more details.

Terms of Service

Terms of Service Link

MAI-Transcribe-1 is provided under Microsoft’s proprietary licensing terms. Access to the model is subscription-based and governed by Microsoft’s product licensing policies.
Model Specifications
LicenseCustom
Last UpdatedApril 2026
Input TypeAudio
Output TypeText
ProviderMicrosoft