Overview
Google Cloud Text-to-Speech provides high-quality speech synthesis with two implementations:GoogleTTSService
: Websocket-based streaming serviceGoogleHttpTTSService
: HTTP-based streaming service
GoogleTTSService
offers the lowest latency and is the recommended option.API Reference
Complete API documentation and method details
Google Cloud TTS Docs
Official Google Cloud Text-to-Speech documentation
Example Code
Working example with Chirp 3 HD voice
Installation
To use Google services, install the required dependencies:- Environment variable:
GOOGLE_APPLICATION_CREDENTIALS
(path to service account JSON) - Service account JSON string
- Service account file path
Create a service account in the Google Cloud
Console with
Text-to-Speech API permissions.
Frames
Input
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that should be spoken immediatelyTTSUpdateSettingsFrame
- Runtime configuration updatesLLMFullResponseStartFrame
/LLMFullResponseEndFrame
- LLM response boundaries
Output
TTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data (PCM format)TTSStoppedFrame
- Signals completion of synthesisErrorFrame
- Google Cloud API or processing errors
Service Comparison
Feature | GoogleTTSService (Streaming) | GoogleHttpTTSService (HTTP) |
---|---|---|
Streaming | ✅ Real-time chunks | ❌ Single audio block |
Latency | 🚀 Ultra-low | 📈 Higher |
Voice Support | Chirp 3 HD, Journey only | All Google voices |
SSML Support | ❌ Plain text only | ✅ Full SSML |
Customization | ⚠️ Basic | ✅ Extensive |
Language Support
View All Supported Languages
View All Supported Languages
Language Code | Description | Service Code |
---|---|---|
Language.AF | Afrikaans | af-ZA |
Language.AR | Arabic | ar-XA |
Language.BN | Bengali | bn-IN |
Language.BG | Bulgarian | bg-BG |
Language.CA | Catalan | ca-ES |
Language.ZH | Chinese (Mandarin) | cmn-CN |
Language.ZH_TW | Chinese (Taiwan) | cmn-TW |
Language.ZH_HK | Chinese (Hong Kong) | yue-HK |
Language.CS | Czech | cs-CZ |
Language.DA | Danish | da-DK |
Language.NL | Dutch | nl-NL |
Language.NL_BE | Dutch (Belgium) | nl-BE |
Language.EN | English (US) | en-US |
Language.EN_AU | English (Australia) | en-AU |
Language.EN_GB | English (UK) | en-GB |
Language.EN_IN | English (India) | en-IN |
Language.ET | Estonian | et-EE |
Language.FIL | Filipino | fil-PH |
Language.FI | Finnish | fi-FI |
Language.FR | French | fr-FR |
Language.FR_CA | French (Canada) | fr-CA |
Language.GL | Galician | gl-ES |
Language.DE | German | de-DE |
Language.EL | Greek | el-GR |
Language.GU | Gujarati | gu-IN |
Language.HE | Hebrew | he-IL |
Language.HI | Hindi | hi-IN |
Language.HU | Hungarian | hu-HU |
Language.IS | Icelandic | is-IS |
Language.ID | Indonesian | id-ID |
Language.IT | Italian | it-IT |
Language.JA | Japanese | ja-JP |
Language.KN | Kannada | kn-IN |
Language.KO | Korean | ko-KR |
Language.LV | Latvian | lv-LV |
Language.LT | Lithuanian | lt-LT |
Language.MS | Malay | ms-MY |
Language.ML | Malayalam | ml-IN |
Language.MR | Marathi | mr-IN |
Language.NO | Norwegian | nb-NO |
Language.PA | Punjabi | pa-IN |
Language.PL | Polish | pl-PL |
Language.PT | Portuguese | pt-PT |
Language.PT_BR | Portuguese (Brazil) | pt-BR |
Language.RO | Romanian | ro-RO |
Language.RU | Russian | ru-RU |
Language.SR | Serbian | sr-RS |
Language.SK | Slovak | sk-SK |
Language.ES | Spanish | es-ES |
Language.ES_US | Spanish (US) | es-US |
Language.SV | Swedish | sv-SE |
Language.TA | Tamil | ta-IN |
Language.TE | Telugu | te-IN |
Language.TH | Thai | th-TH |
Language.TR | Turkish | tr-TR |
Language.UK | Ukrainian | uk-UA |
Language.VI | Vietnamese | vi-VN |
Language.EN_US
- English (US)Language.EN_GB
- English (UK)Language.FR
- FrenchLanguage.DE
- GermanLanguage.ES
- SpanishLanguage.IT
- Italian
Credential Setup
Environment Variable Method
Direct Credentials
Usage Example
Streaming Service (Recommended for Real-time)
InitializeGoogleTTSService
and use it in a pipeline:
HTTP Service (Full SSML Support)
InitializeGoogleHttpTTSService
for more customization options:
Dynamic Configuration
Make settings updates by pushing aTTSUpdateSettingsFrame
:
Metrics
Both services provide comprehensive metrics:- Time to First Byte (TTFB) - Latency from text input to first audio
- Processing Duration - Total synthesis time
- Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.
Additional Notes
- Voice Compatibility: Streaming service only supports Chirp 3 HD and Journey voices
- SSML Limitations: Chirp and Journey voices don’t support SSML - use plain text input
- Credential Management: Supports multiple authentication methods for flexibility
- Regional Voices: Match voice selection with language code for optimal results
- Streaming Advantage: Use streaming service for conversational AI requiring ultra-low latency
- HTTP Advantage: Use HTTP service when you need extensive voice customization via SSML