Deepgram

Overview

DeepgramSTTService provides real-time speech recognition using Deepgram’s WebSocket API with support for interim results, language detection, and voice activity detection (VAD).

API Reference

Complete API documentation and method details

Deepgram Docs

Official Deepgram documentation and features

Example Code

Working example with interruption handling

Installation

To use DeepgramSTTService, install the required dependencies:

pip install "pipecat-ai[deepgram]"

You’ll also need to set up your Deepgram API key as an environment variable: DEEPGRAM_API_KEY.

Get your API key from the Deepgram Console.

Frames

Input

InputAudioRawFrame - Raw PCM audio data (16-bit, 16kHz, mono)
UserStartedSpeakingFrame - Triggers metrics collection
UserStoppedSpeakingFrame - Sends finalize command to flush session
STTUpdateSettingsFrame - Runtime transcription configuration updates
STTMuteFrame - Mute audio input for transcription

Output

InterimTranscriptionFrame - Real-time transcription updates
TranscriptionFrame - Final transcription results
ErrorFrame - Connection or processing errors

Models

Deepgram offers several models optimized for different use cases. Popular models include:

Model	Best For	Features
`nova-3-general`	General purpose, meetings	Latest accuracy, punctuation
`nova-2-general`	General purpose, meetings	Latest accuracy, punctuation
`nova-2-phonecall`	Phone calls, low quality audio	Noise robust, telephony optimized

See Deepgram’s model docs for detailed performance metrics.

Language Support

Deepgram STT supports the following languages and regional variants:

Language Code	Description	Service Codes
`Language.BG`	Bulgarian	`bg`
`Language.CA`	Catalan	`ca`
`Language.ZH`	Chinese (Mandarin, Simplified)	`zh`, `zh-CN`, `zh-Hans`
`Language.ZH_TW`	Chinese (Mandarin, Traditional)	`zh-TW`, `zh-Hant`
`Language.ZH_HK`	Chinese (Cantonese, Traditional)	`zh-HK`
`Language.CS`	Czech	`cs`
`Language.DA`	Danish	`da`, `da-DK`
`Language.NL`	Dutch	`nl`
`Language.NL_BE`	Dutch (Flemish)	`nl-BE`
`Language.EN`	English	`en`
`Language.EN_US`	English (US)	`en-US`
`Language.EN_AU`	English (Australia)	`en-AU`
`Language.EN_GB`	English (UK)	`en-GB`
`Language.EN_NZ`	English (New Zealand)	`en-NZ`
`Language.EN_IN`	English (India)	`en-IN`
`Language.ET`	Estonian	`et`
`Language.FI`	Finnish	`fi`
`Language.FR`	French	`fr`
`Language.FR_CA`	French (Canada)	`fr-CA`
`Language.DE`	German	`de`
`Language.DE_CH`	German (Switzerland)	`de-CH`
`Language.EL`	Greek	`el`
`Language.HI`	Hindi	`hi`
`Language.HU`	Hungarian	`hu`
`Language.ID`	Indonesian	`id`
`Language.IT`	Italian	`it`
`Language.JA`	Japanese	`ja`
`Language.KO`	Korean	`ko`, `ko-KR`
`Language.LV`	Latvian	`lv`
`Language.LT`	Lithuanian	`lt`
`Language.MS`	Malay	`ms`
`Language.NO`	Norwegian	`no`
`Language.PL`	Polish	`pl`
`Language.PT`	Portuguese	`pt`
`Language.PT_BR`	Portuguese (Brazil)	`pt-BR`
`Language.PT_PT`	Portuguese (Portugal)	`pt-PT`
`Language.RO`	Romanian	`ro`
`Language.RU`	Russian	`ru`
`Language.SK`	Slovak	`sk`
`Language.ES`	Spanish	`es`, `es-419`
`Language.SV`	Swedish	`sv`, `sv-SE`
`Language.TH`	Thai	`th`, `th-TH`
`Language.TR`	Turkish	`tr`
`Language.UK`	Ukrainian	`uk`
`Language.VI`	Vietnamese	`vi`

Usage Example

Basic Configuration

Initialize the DeepgramSTTService and use it in a pipeline:

from pipecat.services.deepgram.stt import DeepgramSTTService, LiveOptions
from pipecat.transcriptions.language import Language

# Configure service
stt = DeepgramSTTService(
    api_key="your-api-key",
    live_options=LiveOptions(
        model="nova-3-general",
        language=Language.EN,
        smart_format=True
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Dynamic Configuration

Update settings dynamically by pushing an STTUpdateSettingsFrame:

from pipecat.frames.frames import STTUpdateSettingsFrame

await task.queue_frame(
    STTUpdateSettingsFrame(settings={"language": Language.FR})
)

Metrics

The service provides:

Time to First Byte (TTFB) - Latency from audio input to first transcription
Processing Duration - Total time spent processing audio

Learn how to enable Metrics in your Pipeline.

Additional Notes

Connection Management: Automatically handles WebSocket connections and reconnections
VAD Integration: Supports Deepgram’s built-in VAD, though we recommend using local VAD services like Silero for better performance
Sample Rate: Can be configured per service, but we recommend setting it globally in PipelineParams for consistency across services

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

Deepgram Docs

Example Code

Installation

Frames

Input

Output

Models

Language Support

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

Deepgram Docs

Example Code

​Installation

​Frames

​Input

​Output

​Models

​Language Support

​Usage Example

​Basic Configuration

​Dynamic Configuration

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Models

Language Support

Usage Example

Basic Configuration

Dynamic Configuration

Metrics

Additional Notes