Overview
Deepgram’s Aura API provides high-quality text-to-speech synthesis with streaming capabilities and ultra-low latency. The service offers various voice models optimized for conversational AI applications with efficient audio streaming.API Reference
Complete API documentation and method details
Deepgram TTS Docs
Official Deepgram text-to-speech API documentation
Example Code
Working example with Silero VAD
Installation
To use Deepgram services, install the required dependencies:DEEPGRAM_API_KEY
.
Get your API key from the Deepgram Console.
Frames
Input
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that should be spoken immediatelyTTSUpdateSettingsFrame
- Runtime configuration updatesLLMFullResponseStartFrame
/LLMFullResponseEndFrame
- LLM response boundaries
Output
TTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data chunks (streaming)TTSStoppedFrame
- Signals completion of synthesisErrorFrame
- API or processing errors
Voice Models
Deepgram offers various Aura voice models optimized for different use cases. Here are some highlights:Voice Model | Description | Language |
---|---|---|
aura-2-helena-en | Natural female voice | English |
aura-2-andromeda-en | Expressive female voice | English |
aura-helios-en | Warm male voice | English |
aura-luna-en | Conversational female voice | English |
aura-stella-en | Professional female voice | English |
aura-zeus-en | Authoritative male voice | English |
Deepgram regularly adds new voice models. Check the official
documentation
for the latest available voices.
Supported Sample Rates
- 8000 Hz - Phone quality
- 16000 Hz - Standard quality
- 24000 Hz - High quality (default)
- 44100 Hz - CD quality
- 48000 Hz - Professional quality
Integration with VAD
Deepgram TTS works seamlessly with Voice Activity Detection:Using Silero VAD (Recommended)
Using Deepgram’s Built-in VAD Events
Usage Example
Basic Configuration
Initialize theDeepgramTTSService
with your API key and use it in your pipeline:
Custom Configuration
Dynamic Configuration
Make settings updates by pushing aTTSUpdateSettingsFrame
for the DeepgramTTSService
:
Metrics
The service provides comprehensive metrics:- Time to First Byte (TTFB) - Latency from text input to first audio
- Processing Duration - Total synthesis time
- Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.
Additional Notes
- Streaming Audio: Service streams audio in chunks for low-latency playback
- Voice Selection: Choose voices based on your application’s tone and audience
- Sample Rate Matching: Ensure sample rate matches your pipeline’s audio output sample rate