Overview
ElevenLabs provides high-quality text-to-speech synthesis with two implementations:ElevenLabsTTSService
: WebSocket-based with word timestamps and audio context managementElevenLabsHttpTTSService
: HTTP-based for simpler integration
ElevenLabsTTSService
is recommended for real-time applications requiring
precise timing.API Reference
Complete API documentation and method details
ElevenLabs TTS Docs
Official ElevenLabs text-to-speech API documentation
Example Code
Working example with WebSocket streaming
Installation
To use ElevenLabs services, install the required dependencies:ELEVENLABS_API_KEY
.
Get your API key by signing up at
ElevenLabs.
Frames
Input
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that should be spoken immediatelyTTSUpdateSettingsFrame
- Runtime configuration updatesLLMFullResponseStartFrame
/LLMFullResponseEndFrame
- LLM response boundaries
Output
TTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data chunks with word timingTTSStoppedFrame
- Signals completion of synthesisErrorFrame
- API or processing errors
Service Comparison
Feature | ElevenLabsTTSService (WebSocket) | ElevenLabsHttpTTSService (HTTP) |
---|---|---|
Word Timestamps | ✅ Real-time precision | ✅ Batch processing |
Streaming | ✅ Low-latency chunks | ✅ Response streaming |
Audio Context | ✅ Advanced management | ❌ Basic |
Interruption | ✅ Context-aware | ⚠️ Limited |
Connection | WebSocket persistent | HTTP per-request |
Language Support
View All Supported Languages
View All Supported Languages
Language Code | Description | Service Code |
---|---|---|
Language.AR | Arabic | ar |
Language.BG | Bulgarian | bg |
Language.CS | Czech | cs |
Language.DA | Danish | da |
Language.DE | German | de |
Language.EL | Greek | el |
Language.EN | English | en |
Language.ES | Spanish | es |
Language.FI | Finnish | fi |
Language.FIL | Filipino | fil |
Language.FR | French | fr |
Language.HI | Hindi | hi |
Language.HR | Croatian | hr |
Language.HU | Hungarian | hu |
Language.ID | Indonesian | id |
Language.IT | Italian | it |
Language.JA | Japanese | ja |
Language.KO | Korean | ko |
Language.MS | Malay | ms |
Language.NL | Dutch | nl |
Language.NO | Norwegian | no |
Language.PL | Polish | pl |
Language.PT | Portuguese | pt |
Language.RO | Romanian | ro |
Language.RU | Russian | ru |
Language.SK | Slovak | sk |
Language.SV | Swedish | sv |
Language.TA | Tamil | ta |
Language.TR | Turkish | tr |
Language.UK | Ukrainian | uk |
Language.VI | Vietnamese | vi |
Language.ZH | Chinese | zh |
Language.EN
- EnglishLanguage.ES
- SpanishLanguage.FR
- FrenchLanguage.DE
- GermanLanguage.IT
- ItalianLanguage.JA
- Japanese
Language support varies by model. Use multilingual models
(
eleven_flash_v2_5
, eleven_turbo_v2_5
) for language specification.Supported Sample Rates
ElevenLabs supports specific sample rates with automatic format selection:- 8000 Hz -
pcm_8000
- 16000 Hz -
pcm_16000
- 22050 Hz -
pcm_22050
- 24000 Hz -
pcm_24000
(default) - 44100 Hz -
pcm_44100
Model Selection
Choose the right model for your use case:Model | Quality | Latency | Multilingual | Best For |
---|---|---|---|---|
eleven_flash_v2_5 | High | Ultra-low | ✅ | Real-time conversations |
eleven_turbo_v2_5 | High | Ultra-low | ✅ | Real-time conversations |
eleven_multilingual_v2 | High | Medium | ✅ | Quality + languages |
eleven_flash_v2 | High | Low | ❌ | English-only apps |
Usage Example
WebSocket Service (Recommended)
Initialize theElevenLabsTTSService
with your API key and use it in your pipeline:
HTTP Service
Initialize theElevenLabsHttpTTSService
and use it in a pipeline:
Dynamic Configuration
Make settings updates by pushing anTTSUpdateSettingsFrame
for either service:
Voice Customization
ElevenLabs offers extensive voice control parameters:Metrics
Both services provide comprehensive metrics:- Time to First Byte (TTFB) - Latency from text input to first audio
- Processing Duration - Total synthesis time
- Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.
Additional Notes
- WebSocket Recommended: Use
ElevenLabsTTSService
for real-time applications with word timestamps and audio context management - Connection Management: WebSocket maintains persistent connection with automatic keepalive (10-second intervals)
- Audio Context: WebSocket service manages multiple audio contexts for handling interruptions and overlapping requests
- Voice Settings: Both
stability
andsimilarity_boost
must be set together for voice customization - Language Specification: Only works with multilingual models (
eleven_flash_v2_5
,eleven_turbo_v2_5
,eleven_multilingual_v2
) - Sample Rate Constraints: Must use supported sample rates (8000, 16000, 22050, 24000, or 44100 Hz)
- SSML Support: Enable SSML parsing for advanced speech markup control