Overview
NVIDIA Riva provides high-quality text-to-speech synthesis through cloud-based AI models accessible via gRPC API. The service offers multilingual support, configurable quality settings, and streaming audio generation optimized for real-time applications.API Reference
Complete API documentation and method details
NVIDIA Riva Docs
Official NVIDIA Riva TTS documentation
Example Code
Working example with Riva NIM
Installation
To use NVIDIA Riva services, install the required dependencies:NVIDIA_API_KEY
.
Get your API key from the NVIDIA Developer
Portal and access to Riva services.
Frames
Input
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that should be spoken immediatelyTTSUpdateSettingsFrame
- Runtime configuration updatesLLMFullResponseStartFrame
/LLMFullResponseEndFrame
- LLM response boundaries
Output
TTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data chunks (streaming)TTSStoppedFrame
- Signals completion of synthesisErrorFrame
- API or processing errors
Available Models
Model | Description | Best For |
---|---|---|
magpie-tts-multilingual | Multilingual model with natural voices | Conversational AI, multiple languages |
fastpitch-hifigan-tts | High-quality English synthesis | English-only applications |
The
magpie-tts-multilingual
model is the default and recommended for most
use cases due to its multilingual capabilities and natural voice quality.Language Support
Themagpie-tts-multilingual
model supports:
Language Code | Description | Service Code |
---|---|---|
Language.EN_US | English (US) | en-US |
Language.ES_US | Spanish (US) | es-US |
Language.FR_FR | French (France) | fr-FR |
Language.DE_DE | German (Germany) | de-DE |
Language.IT_IT | Italian (Italy) | it-IT |
Language.ZH_CN | Chinese (China) | zh-CN |
Usage Example
Basic Configuration
Initialize the Riva TTS service with your API key and desired voice:Dynamic Configuration
Make settings updates by pushing aTTSUpdateSettingsFrame
for the RivaTTSService
:
Metrics
The service provides comprehensive metrics:- Time to First Byte (TTFB) - Latency from text input to first audio
- Processing Duration - Total synthesis time
- Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.
Additional Notes
- Model Set at Initialization: Models cannot be changed after initialization - configure
model_function_map
during construction - Deprecated Classes:
FastPitchTTSService
is deprecated - useRivaTTSService
instead - Quality vs Speed: Higher quality settings increase synthesis time but improve audio quality