Overview
MiniMax’s T2A (Text-to-Audio) API provides high-quality text-to-speech synthesis with streaming capabilities, emotional voice control, and support for multiple languages. The service offers various models optimized for different use cases, from low-latency to high-definition audio quality.API Reference
Complete API documentation and method details
MiniMax T2A Docs
Official MiniMax T2A API documentation
Example Code
Working example with emotional voice settings
Installation
To use MiniMax services, no additional dependencies are required beyond the base installation:MINIMAX_API_KEY
MINIMAX_GROUP_ID
Get your API credentials from the MiniMax
Platform.
Frames
Input
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that should be spoken immediatelyTTSUpdateSettingsFrame
- Runtime configuration updatesLLMFullResponseStartFrame
/LLMFullResponseEndFrame
- LLM response boundaries
Output
TTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data chunks (streaming PCM)TTSStoppedFrame
- Signals completion of synthesisErrorFrame
- API or processing errors
Model Comparison
Model | Quality | Latency | Features |
---|---|---|---|
speech-02-hd | Highest | Higher | Superior rhythm and stability |
speech-02-turbo | High | Lower | Enhanced multilingual capabilities |
speech-01-hd | High | Medium | Rich voices with expressive emotions |
speech-01-turbo | Good | Lowest | Regular updates, fast response |
Refer to the MiniMax
documentation
for up-to-date model information.
Voice Selection
MiniMax offers diverse voice personalities:Voice ID | Description | Tone |
---|---|---|
Wise_Woman | Mature female voice | Authoritative, knowledgeable |
Friendly_Person | Warm, approachable | Conversational, welcoming |
Patient_Man | Calm male voice | Steady, reassuring |
Lively_Girl | Young female voice | Energetic, enthusiastic |
Deep_Voice_Man | Rich male voice | Professional, commanding |
Calm_Woman | Serene female voice | Peaceful, soothing |
Elegant_Man | Sophisticated male | Refined, articulate |
See the MiniMax
documentation
for the complete list of available voices.
Supported Sample Rates
MiniMax supports multiple sample rates for different quality levels:- 8000 Hz
- 16000 Hz
- 22050 Hz
- 24000 Hz
- 32000 Hz
- 44100 Hz
Language Support
View All Supported Languages
View All Supported Languages
Language Code | Description | Service Code |
---|---|---|
Language.AR | Arabic | Arabic |
Language.CS | Czech | Czech |
Language.DE | German | German |
Language.EL | Greek | Greek |
Language.EN | English | English |
Language.ES | Spanish | Spanish |
Language.FI | Finnish | Finnish |
Language.FR | French | French |
Language.HI | Hindi | Hindi |
Language.ID | Indonesian | Indonesian |
Language.IT | Italian | Italian |
Language.JA | Japanese | Japanese |
Language.KO | Korean | Korean |
Language.NL | Dutch | Dutch |
Language.PL | Polish | Polish |
Language.PT | Portuguese | Portuguese |
Language.RO | Romanian | Romanian |
Language.RU | Russian | Russian |
Language.TH | Thai | Thai |
Language.TR | Turkish | Turkish |
Language.UK | Ukrainian | Ukrainian |
Language.VI | Vietnamese | Vietnamese |
Language.YUE | Chinese (Cantonese) | Chinese,Yue |
Language.ZH | Chinese (Mandarin) | Chinese |
Language.EN
- EnglishLanguage.ZH
- Chinese (Mandarin)Language.ES
- SpanishLanguage.FR
- FrenchLanguage.DE
- GermanLanguage.JA
- Japanese
Usage Example
Basic Configuration
Initialize theMiniMaxHttpTTSService
and use it in a pipeline:
Dynamic Configuration
Make settings updates by pushing aTTSUpdateSettingsFrame
for the MiniMaxHttpTTSService
:
Metrics
The service provides comprehensive metrics:- Time to First Byte (TTFB) - Latency from text input to first audio
- Processing Duration - Total synthesis time
- Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.
Additional Notes
- HTTP Session Required: Must provide an
aiohttp.ClientSession
for API communication - Emotional AI: Advanced emotional expression capabilities with voice-specific optimizations
- Text Normalization: Optional English normalization for better number and abbreviation handling