AssemblyAI

Overview

AssemblyAISTTService provides real-time speech recognition using AssemblyAI’s WebSocket API with support for interim results, end-of-turn detection, and configurable audio processing parameters.

API Reference

Complete API documentation and method details

AssemblyAI Docs

Official AssemblyAI documentation and features

Example Code

Working example with interruption handling

Installation

To use AssemblyAI services, install the required dependency:

pip install "pipecat-ai[assemblyai]"

You’ll also need to set up your AssemblyAI API key as an environment variable: ASSEMBLYAI_API_KEY.

Get your API key from AssemblyAI Console.

Frames

Input

InputAudioRawFrame - Raw PCM audio data (16-bit, 16kHz, mono)
UserStartedSpeakingFrame - VAD start signal (triggers TTFB metrics)
UserStoppedSpeakingFrame - VAD stop signal (triggers force endpoint if enabled)
STTUpdateSettingsFrame - Runtime transcription configuration updates
STTMuteFrame - Mute audio input for transcription

Output

InterimTranscriptionFrame - Real-time transcription updates
TranscriptionFrame - Final transcription results
TranslationFrame - Translated text (if translation is enabled)
ErrorFrame - Connection or processing errors

Language Support

AssemblyAI Streaming STT currently supports English only.

Usage Example

Basic Configuration

Initialize the AssemblyAISTTService and use it in a pipeline:

from pipecat.services.assemblyai.stt import AssemblyAISTTService
from pipecat.services.assemblyai.models import AssemblyAIConnectionParams

# Basic configuration
stt = AssemblyAISTTService(
    api_key=os.getenv("ASSEMBLYAI_API_KEY"),
)

# Configuration with custom parameters
stt = AssemblyAISTTService(
    api_key=os.getenv("ASSEMBLYAI_API_KEY"),
    connection_params=AssemblyAIConnectionParams(
        sample_rate=16000,
        formatted_finals=True,
        end_of_turn_confidence_threshold=0.8,
        max_turn_silence=1000
    ),
    vad_force_turn_endpoint=True
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Make settings updates by pushing an STTUpdateSettingsFrame for the AssemblyAISTTService:

from pipecat.frames.frames import STTUpdateSettingsFrame

await task.queue_frame(
    STTUpdateSettingsFrame(settings={"language": Language.FR})
)

Metrics

The service provides:

Time to First Byte (TTFB) - Latency from speech start to first transcription
Processing Duration - Total time spent processing audio

Learn how to enable Metrics in your Pipeline.

Additional Notes

Connection Management: Automatically handles WebSocket connections, reconnections, and proper termination handshakes
VAD Integration: Supports force endpoint triggering when VAD detects speech stop, requiring a VAD processor in the pipeline
Error Handling: Error handling for connection issues and message processing failures

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

AssemblyAI Docs

Example Code

Installation

Frames

Input

Output

Language Support

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

AssemblyAI Docs

Example Code

​Installation

​Frames

​Input

​Output

​Language Support

​Usage Example

​Basic Configuration

​Dynamic Configuration

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Language Support

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes