Overview
Piper provides high-quality neural text-to-speech synthesis through a self-hosted HTTP server. The service offers complete privacy and control with no external API dependencies, making it ideal for on-premise deployments and applications requiring data sovereignty.API Reference
Complete API documentation and method details
Piper TTS Docs
Official Piper TTS documentation and setup
Installation
To use Piper services, no additional Pipecat dependencies are required:Piper runs entirely locally, providing complete privacy and eliminating API
key requirements.
Frames
Input
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that should be spoken immediatelyTTSUpdateSettingsFrame
- Runtime configuration updatesLLMFullResponseStartFrame
/LLMFullResponseEndFrame
- LLM response boundaries
Output
TTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data chunks (WAV headers automatically removed)TTSStoppedFrame
- Signals completion of synthesisErrorFrame
- HTTP server or processing errors
Voice Models
Piper offers various pre-trained voice models with different qualities and languages:English Models
en_US-lessac-medium
- Natural female voice, balanced qualityen_US-ryan-high
- High-quality male voiceen_US-amy-medium
- Clear female voiceen_GB-alan-medium
- British male voice
Quality Levels
- low - Fastest, smallest file size
- medium - Balanced quality and speed
- high - Best quality, larger models
Check the Piper voices
repository for the complete list
of available models and languages.
Supported Sample Rates
Piper supports multiple sample rates depending on the model quality:- Low quality: 16kHz
- Medium quality: 22.05kHz
- High quality: 24kHz
Usage Example
Basic Configuration
Initialize the Piper TTS service and use it in a pipeline:Dynamic Voice Switching
You can dynamically switch voices by updating thevoice_id
parameter:
Metrics
The service provides comprehensive metrics:- Time to First Byte (TTFB) - Latency from text input to first audio
- Processing Duration - Total synthesis time
- Character Usage - Text processed for monitoring
Learn how to enable Metrics in your Pipeline.
Additional Notes
- Self-Hosted: Complete control over TTS infrastructure and data privacy
- No API Keys: No external service dependencies or API costs
- Language Support: Multiple languages available through different voice models