Rime

Overview

Rime AI provides two TTS service implementations:

RimeTTSService: WebSocket-based with word-level timing and interruption support
RimeHttpTTSService: HTTP-based for simpler use cases

RimeTTSService is recommended for real-time interactive applications.

API Reference

Complete API documentation and method details

Rime Docs

Official Rime WebSocket and HTTP API documentation

Example Code

Working example with word timestamps

Installation

To use Rime services, install the required dependencies:

pip install "pipecat-ai[rime]"

You’ll also need to set up your Rime API key as an environment variable: RIME_API_KEY.

Get your API key by signing up at Rime.

Frames

Input

TextFrame - Text content to synthesize into speech
TTSSpeakFrame - Text that should be spoken immediately
TTSUpdateSettingsFrame - Runtime configuration updates
LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries

Output

TTSStartedFrame - Signals start of synthesis
TTSAudioRawFrame - Generated audio data chunks (PCM format)
TTSTextFrame - Word-level timing information (WebSocket service only)
TTSStoppedFrame - Signals completion of synthesis
ErrorFrame - API or processing errors

Service Comparison

Feature	RimeTTSService (WebSocket)	RimeHttpTTSService (HTTP)
Word Timestamps	✅ Precise timing	❌ Not available
Interruption	✅ Context tracking	⚠️ Basic support
Streaming	✅ Real-time chunks	✅ Chunked response
Inline Speed	❌ Not supported	✅ Word-level control
Arcana Model	❌ Not supported	✅ Latest model

Model Options

Model	Description	Availability
mistv2	Hyper-realistic conversational voices (recommended)	Both services
mist	Previous generation model	Both services
arcana	Latest high-quality model	HTTP only

Supported Sample Rates

WebSocket Service

Sample rates must be between 4000 Hz and 44100 Hz. Default: 24000 Hz.

HTTP Service

Sample rates must be between 8000 Hz and 96000 Hz. Default: 24000 Hz. Anything above 24000 Hz is up-sampling.

Language Support

Language Code	Description	Service Code
`Language.DE`	German	`ger`
`Language.EN`	English	`eng`
`Language.ES`	Spanish	`spa`
`Language.FR`	French	`fra`

Usage Example

WebSocket Service (Recommended)

Initialize the WebSocket service with your API key and desired voice:

from pipecat.services.rime.tts import RimeTTSService
from pipecat.transcriptions.language import Language
import os

# Configure WebSocket service
tts = RimeTTSService(
    api_key=os.getenv("RIME_API_KEY"),
    voice_id="rex",
    model="mistv2",
    params=RimeTTSService.InputParams(
        language=Language.EN,
        speed_alpha=1.0,
        reduce_latency=False,
        pause_between_brackets=True,
        phonemize_between_brackets=False
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,  # Word timestamps enable precise context updates
    transport.output(),
    context_aggregator.assistant()
])

HTTP Service

Initialize the HTTP service and use it in a pipeline:

import aiohttp
from pipecat.services.rime.tts import RimeHttpTTSService

# Configure HTTP service for batch processing
async with aiohttp.ClientSession() as session:
    http_tts = RimeHttpTTSService(
        api_key=os.getenv("RIME_API_KEY"),
        voice_id="eva",
        aiohttp_session=session,
        model="arcana",  # Latest model
        params=RimeHttpTTSService.InputParams(
            language=Language.EN,
            speed_alpha=1.2,
            inline_speed_alpha="0.8,1.5",  # Word-level speed control
            pause_between_brackets=True
        )
    )

Dynamic Configuration

Make settings updates by pushing a TTSUpdateSettingsFrame:

from pipecat.frames.frames import TTSUpdateSettingsFrame

await task.queue_frame(
    TTSUpdateSettingsFrame(settings={"voice": "your-new-voice-id"})
)

Metrics

Both services provide comprehensive metrics:

Time to First Byte (TTFB) - Latency from text input to first audio
Processing Duration - Total synthesis time
Character Usage - Text processed for billing

Learn how to enable Metrics in your Pipeline.

Additional Notes

WebSocket Recommended: Use RimeTTSService for interactive applications requiring word timestamps and precise context management
Context Tracking: WebSocket service maintains context across multiple messages within a turn
Text Aggregation: WebSocket service uses SkipTagsAggregator by default to handle Rime’s spell() tags
Model Selection: Use mistv2 for best balance of quality and performance, arcana for highest quality (HTTP only)
Advanced Controls: HTTP service supports more text markup features like inline speed control

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

Rime Docs

Example Code

Installation

Frames

Input

Output

Service Comparison

Model Options

Supported Sample Rates

WebSocket Service

HTTP Service

Language Support

Usage Example

WebSocket Service (Recommended)

HTTP Service

Dynamic Configuration

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

Rime Docs

Example Code

​Installation

​Frames

​Input

​Output

​Service Comparison

​Model Options

​Supported Sample Rates

​WebSocket Service

​HTTP Service

​Language Support

​Usage Example

​WebSocket Service (Recommended)

​HTTP Service

​Dynamic Configuration

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Service Comparison

Model Options

Supported Sample Rates

WebSocket Service

HTTP Service

Language Support

Usage Example

WebSocket Service (Recommended)

HTTP Service

Dynamic Configuration

Metrics

Additional Notes