Overview
Rime AI provides two TTS service implementations:RimeTTSService
: WebSocket-based with word-level timing and interruption supportRimeHttpTTSService
: HTTP-based for simpler use cases
RimeTTSService
is recommended for real-time interactive applications.API Reference
Complete API documentation and method details
Rime Docs
Official Rime WebSocket and HTTP API documentation
Example Code
Working example with word timestamps
Installation
To use Rime services, install the required dependencies:RIME_API_KEY
.
Get your API key by signing up at Rime.
Frames
Input
TextFrame
- Text content to synthesize into speechTTSSpeakFrame
- Text that should be spoken immediatelyTTSUpdateSettingsFrame
- Runtime configuration updatesLLMFullResponseStartFrame
/LLMFullResponseEndFrame
- LLM response boundaries
Output
TTSStartedFrame
- Signals start of synthesisTTSAudioRawFrame
- Generated audio data chunks (PCM format)TTSTextFrame
- Word-level timing information (WebSocket service only)TTSStoppedFrame
- Signals completion of synthesisErrorFrame
- API or processing errors
Service Comparison
Feature | RimeTTSService (WebSocket) | RimeHttpTTSService (HTTP) |
---|---|---|
Word Timestamps | ✅ Precise timing | ❌ Not available |
Interruption | ✅ Context tracking | ⚠️ Basic support |
Streaming | ✅ Real-time chunks | ✅ Chunked response |
Inline Speed | ❌ Not supported | ✅ Word-level control |
Arcana Model | ❌ Not supported | ✅ Latest model |
Model Options
Model | Description | Availability |
---|---|---|
mistv2 | Hyper-realistic conversational voices (recommended) | Both services |
mist | Previous generation model | Both services |
arcana | Latest high-quality model | HTTP only |
Supported Sample Rates
WebSocket Service
Sample rates must be between 4000 Hz and 44100 Hz. Default: 24000 Hz.HTTP Service
Sample rates must be between 8000 Hz and 96000 Hz. Default: 24000 Hz. Anything above 24000 Hz is up-sampling.Language Support
Language Code | Description | Service Code |
---|---|---|
Language.DE | German | ger |
Language.EN | English | eng |
Language.ES | Spanish | spa |
Language.FR | French | fra |
Usage Example
WebSocket Service (Recommended)
Initialize the WebSocket service with your API key and desired voice:HTTP Service
Initialize the HTTP service and use it in a pipeline:Dynamic Configuration
Make settings updates by pushing aTTSUpdateSettingsFrame
:
Metrics
Both services provide comprehensive metrics:- Time to First Byte (TTFB) - Latency from text input to first audio
- Processing Duration - Total synthesis time
- Character Usage - Text processed for billing
Learn how to enable Metrics in your Pipeline.
Additional Notes
- WebSocket Recommended: Use
RimeTTSService
for interactive applications requiring word timestamps and precise context management - Context Tracking: WebSocket service maintains context across multiple messages within a turn
- Text Aggregation: WebSocket service uses
SkipTagsAggregator
by default to handle Rime’sspell()
tags - Model Selection: Use
mistv2
for best balance of quality and performance,arcana
for highest quality (HTTP only) - Advanced Controls: HTTP service supports more text markup features like inline speed control