Overview
AssemblyAISTTService
provides real-time speech recognition using AssemblyAI’s WebSocket API with support for interim results, end-of-turn detection, and configurable audio processing parameters.
API Reference
Complete API documentation and method details
AssemblyAI Docs
Official AssemblyAI documentation and features
Example Code
Working example with interruption handling
Installation
To use AssemblyAI services, install the required dependency:ASSEMBLYAI_API_KEY
.
Get your API key from AssemblyAI
Console.
Frames
Input
InputAudioRawFrame
- Raw PCM audio data (16-bit, 16kHz, mono)UserStartedSpeakingFrame
- VAD start signal (triggers TTFB metrics)UserStoppedSpeakingFrame
- VAD stop signal (triggers force endpoint if enabled)STTUpdateSettingsFrame
- Runtime transcription configuration updatesSTTMuteFrame
- Mute audio input for transcription
Output
InterimTranscriptionFrame
- Real-time transcription updatesTranscriptionFrame
- Final transcription resultsTranslationFrame
- Translated text (if translation is enabled)ErrorFrame
- Connection or processing errors
Language Support
AssemblyAI Streaming STT currently supports English only.Usage Example
Basic Configuration
Initialize theAssemblyAISTTService
and use it in a pipeline:
Dynamic Configuration
Make settings updates by pushing anSTTUpdateSettingsFrame
for the AssemblyAISTTService
:
Metrics
The service provides:- Time to First Byte (TTFB) - Latency from speech start to first transcription
- Processing Duration - Total time spent processing audio
Learn how to enable Metrics in your Pipeline.
Additional Notes
- Connection Management: Automatically handles WebSocket connections, reconnections, and proper termination handshakes
- VAD Integration: Supports force endpoint triggering when VAD detects speech stop, requiring a VAD processor in the pipeline
- Error Handling: Error handling for connection issues and message processing failures