Overview
DeepgramSTTService
provides real-time speech recognition using Deepgram’s WebSocket API with support for interim results, language detection, and voice activity detection (VAD).
API Reference
Complete API documentation and method details
Deepgram Docs
Official Deepgram documentation and features
Example Code
Working example with interruption handling
Installation
To useDeepgramSTTService
, install the required dependencies:
DEEPGRAM_API_KEY
.
Get your API key from the Deepgram
Console.
Frames
Input
InputAudioRawFrame
- Raw PCM audio data (16-bit, 16kHz, mono)UserStartedSpeakingFrame
- Triggers metrics collectionUserStoppedSpeakingFrame
- Sends finalize command to flush sessionSTTUpdateSettingsFrame
- Runtime transcription configuration updatesSTTMuteFrame
- Mute audio input for transcription
Output
InterimTranscriptionFrame
- Real-time transcription updatesTranscriptionFrame
- Final transcription resultsErrorFrame
- Connection or processing errors
Models
Deepgram offers several models optimized for different use cases. Popular models include:Model | Best For | Features |
---|---|---|
nova-3-general | General purpose, meetings | Latest accuracy, punctuation |
nova-2-general | General purpose, meetings | Latest accuracy, punctuation |
nova-2-phonecall | Phone calls, low quality audio | Noise robust, telephony optimized |
Language Support
Deepgram STT supports the following languages and regional variants:Language Code | Description | Service Codes |
---|---|---|
Language.BG | Bulgarian | bg |
Language.CA | Catalan | ca |
Language.ZH | Chinese (Mandarin, Simplified) | zh , zh-CN , zh-Hans |
Language.ZH_TW | Chinese (Mandarin, Traditional) | zh-TW , zh-Hant |
Language.ZH_HK | Chinese (Cantonese, Traditional) | zh-HK |
Language.CS | Czech | cs |
Language.DA | Danish | da , da-DK |
Language.NL | Dutch | nl |
Language.NL_BE | Dutch (Flemish) | nl-BE |
Language.EN | English | en |
Language.EN_US | English (US) | en-US |
Language.EN_AU | English (Australia) | en-AU |
Language.EN_GB | English (UK) | en-GB |
Language.EN_NZ | English (New Zealand) | en-NZ |
Language.EN_IN | English (India) | en-IN |
Language.ET | Estonian | et |
Language.FI | Finnish | fi |
Language.FR | French | fr |
Language.FR_CA | French (Canada) | fr-CA |
Language.DE | German | de |
Language.DE_CH | German (Switzerland) | de-CH |
Language.EL | Greek | el |
Language.HI | Hindi | hi |
Language.HU | Hungarian | hu |
Language.ID | Indonesian | id |
Language.IT | Italian | it |
Language.JA | Japanese | ja |
Language.KO | Korean | ko , ko-KR |
Language.LV | Latvian | lv |
Language.LT | Lithuanian | lt |
Language.MS | Malay | ms |
Language.NO | Norwegian | no |
Language.PL | Polish | pl |
Language.PT | Portuguese | pt |
Language.PT_BR | Portuguese (Brazil) | pt-BR |
Language.PT_PT | Portuguese (Portugal) | pt-PT |
Language.RO | Romanian | ro |
Language.RU | Russian | ru |
Language.SK | Slovak | sk |
Language.ES | Spanish | es , es-419 |
Language.SV | Swedish | sv , sv-SE |
Language.TH | Thai | th , th-TH |
Language.TR | Turkish | tr |
Language.UK | Ukrainian | uk |
Language.VI | Vietnamese | vi |
Usage Example
Basic Configuration
Initialize theDeepgramSTTService
and use it in a pipeline:
Dynamic Configuration
Update settings dynamically by pushing anSTTUpdateSettingsFrame
:
Metrics
The service provides:- Time to First Byte (TTFB) - Latency from audio input to first transcription
- Processing Duration - Total time spent processing audio
Learn how to enable Metrics in your Pipeline.
Additional Notes
- Connection Management: Automatically handles WebSocket connections and reconnections
- VAD Integration: Supports Deepgram’s built-in VAD, though we recommend using local VAD services like Silero for better performance
- Sample Rate: Can be configured per service, but we recommend setting it globally in
PipelineParams
for consistency across services