Overview
Sarvam AI provides text-to-speech synthesis specialized for Indian languages and voices. The service offers extensive voice customization options including pitch, pace, and loudness control, with support for multiple Indian languages and preprocessing for mixed-language content.API Reference
Complete API documentation and method details
Sarvam AI Docs
Official Sarvam AI text-to-speech API documentation
Example Code
Working example with Indian language support
Installation
To use Sarvam AI services, no additional dependencies are required beyond the base installation:SARVAM_API_KEY
.
Get your API key from the Sarvam AI Console.
Frames
Input
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that should be spoken immediatelyTTSUpdateSettingsFrame
- Runtime configuration updatesLLMFullResponseStartFrame
/LLMFullResponseEndFrame
- LLM response boundaries
Output
TTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data (PCM, WAV header stripped)TTSStoppedFrame
- Signals completion of synthesisErrorFrame
- API or processing errors
Supported Sample Rates
- 8000 Hz - Phone quality
- 16000 Hz - Standard quality
- 22050 Hz - High quality
- 24000 Hz - Premium quality (default)
Language Support
Sarvam AI specializes in Indian languages with high-quality voice synthesis:Language Code | Description | Service Code |
---|---|---|
Language.BN | Bengali | bn-IN |
Language.EN | English (India) | en-IN |
Language.GU | Gujarati | gu-IN |
Language.HI | Hindi | hi-IN |
Language.KN | Kannada | kn-IN |
Language.ML | Malayalam | ml-IN |
Language.MR | Marathi | mr-IN |
Language.OR | Odia | od-IN |
Language.PA | Punjabi | pa-IN |
Language.TA | Tamil | ta-IN |
Language.TE | Telugu | te-IN |
TTS Models
- bulbul:v1 - First generation model
- bulbul:v2 - Enhanced model with improved quality (recommended)
Usage Example
Basic Configuration
Initialize the Sarvam TTS service with your API key and desired voice:Dynamic Configuration
Make settings updates by pushing aTTSUpdateSettingsFrame
for the SarvamTTSService
:
Metrics
The service provides comprehensive metrics:- Time to First Byte (TTFB) - Latency from text input to first audio
- Processing Duration - Total synthesis time
- Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.
Additional Notes
- Language Specialization: Optimized for Indian languages with native voice quality
- Voice Quality: High-quality synthesis with natural prosody for Indian languages