Overview
AWS Polly provides text-to-speech synthesis through Amazon’s cloud service with support for standard, neural, and generative engines. The service offers extensive language support, SSML features, and voice customization options including prosody controls for pitch, rate, and volume.API Reference
Complete API documentation and method details
AWS Polly Docs
Official AWS Polly documentation and features
Example Code
Working example with generative engine
Installation
To use AWS Polly services, install the required dependencies:AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
(if using temporary credentials)AWS_REGION
(defaults to “us-east-1”)
Set up AWS credentials through the AWS
Console or use AWS CLI configuration.
Frames
Input
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that should be spoken immediatelyTTSUpdateSettingsFrame
- Runtime configuration updatesLLMFullResponseStartFrame
/LLMFullResponseEndFrame
- LLM response boundaries
Output
TTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data (PCM, resampled from 16kHz)TTSStoppedFrame
- Signals completion of synthesisErrorFrame
- AWS API or processing errors
Language Support
View All Supported Languages
View All Supported Languages
Language Code | Description | Service Code |
---|---|---|
Language.AR | Arabic | arb |
Language.AR_AE | Arabic (UAE) | ar-AE |
Language.CA | Catalan | ca-ES |
Language.ZH | Chinese (Mandarin) | cmn-CN |
Language.YUE | Chinese (Cantonese) | yue-CN |
Language.CS | Czech | cs-CZ |
Language.DA | Danish | da-DK |
Language.NL | Dutch | nl-NL |
Language.NL_BE | Dutch (Belgium) | nl-BE |
Language.EN | English (US) | en-US |
Language.EN_AU | English (Australia) | en-AU |
Language.EN_GB | English (UK) | en-GB |
Language.EN_IN | English (India) | en-IN |
Language.EN_NZ | English (New Zealand) | en-NZ |
Language.EN_ZA | English (South Africa) | en-ZA |
Language.FI | Finnish | fi-FI |
Language.FR | French | fr-FR |
Language.FR_BE | French (Belgium) | fr-BE |
Language.FR_CA | French (Canada) | fr-CA |
Language.DE | German | de-DE |
Language.DE_AT | German (Austria) | de-AT |
Language.DE_CH | German (Switzerland) | de-CH |
Language.HI | Hindi | hi-IN |
Language.IS | Icelandic | is-IS |
Language.IT | Italian | it-IT |
Language.JA | Japanese | ja-JP |
Language.KO | Korean | ko-KR |
Language.NO | Norwegian | nb-NO |
Language.PL | Polish | pl-PL |
Language.PT | Portuguese | pt-PT |
Language.PT_BR | Portuguese (Brazil) | pt-BR |
Language.RO | Romanian | ro-RO |
Language.RU | Russian | ru-RU |
Language.ES | Spanish | es-ES |
Language.ES_MX | Spanish (Mexico) | es-MX |
Language.ES_US | Spanish (US) | es-US |
Language.SV | Swedish | sv-SE |
Language.TR | Turkish | tr-TR |
Language.CY | Welsh | cy-GB |
Language.EN
- English (US)Language.ES
- SpanishLanguage.FR
- FrenchLanguage.DE
- GermanLanguage.IT
- ItalianLanguage.JA
- Japanese
Usage Example
Basic Configuration
Initialize theAWSPollyTTSService
and use it in a pipeline:
Dynamic Configuration
Make settings updates by pushing aTTSUpdateSettingsFrame
for the AWSPollyTTSService
:
SSML Features
AWS Polly automatically constructs SSML for advanced speech control:Metrics
The service provides comprehensive metrics:- Time to First Byte (TTFB) - Latency from text input to first audio
- Processing Duration - Total synthesis time
- Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.
Additional Notes
- Engine Selection: Use generative for highest quality, neural for balance, standard for lowest latency
- Region Requirements: Generative engine only available in select regions (us-west-2, us-east-1, etc.)
- Audio Format: Service outputs PCM audio resampled from 16kHz to your specified rate
- Credential Management: Supports both environment variables and direct credential passing
- SSML Automatic: Service automatically wraps text in appropriate SSML tags based on parameters
- Prosody Limitations: Generative engine only supports rate adjustment, not pitch or volume