Speechmatics

Overview

SpeechmaticsSTTService enables real-time speech transcription using Speechmatics’ WebSocket API with partial + final results, speaker diarization, and end of utterance detection (VAD).

API Reference

Complete API documentation and method details

Speechmatics Docs

Official Speechmatics documentation and features

Speaker Diarization

Separating out different speakers in the audio

Example Code

Working example with interruption handling

Installation

To use SpeechmaticsSTTService, install the required dependencies:

pip install "pipecat-ai[speechmatics]"

You’ll also need to set up your Speechmatics API key as an environment variable: SPEECHMATICS_API_KEY.

Get your API key from the Speechmatics Portal.

Frames

Input

InputAudioRawFrame - Raw PCM audio data (16-bit, 16kHz, mono)

Output

InterimTranscriptionFrame - Real-time transcription updates
TranscriptionFrame - Final transcription results

Endpoints

Speechmatics STT supports the following endpoints (defaults to EU2):

Region	Environment	STT Endpoint	Access
EU	EU1	`wss://neu.rt.speechmatics.com/`	Self-Service / Enterprise
EU	EU2 (Default)	`wss://eu2.rt.speechmatics.com/`	Self-Service / Enterprise
US	US1	`wss://wus.rt.speechmatics.com/`	Enterprise

Feature Discovery

To check the languages and features supported by Speechmatics STT, you can use the following code:

curl "https://eu2.rt.speechmatics.com/v1/discovery/features"

Language Support

Refer to the Speechmatics docs for more information on supported languages.

Speechmatics STT supports the following languages and regional variants. Setting a language can be done using the language parameter when creating the STT object. The exception to this is English / Mandarin which has the code cmn_en and must be set using the language_code parameter.

Language Code	Description	Locales
`Language.AR`	Arabic	-
`Language.BA`	Bashkir	-
`Language.EU`	Basque	-
`Language.BE`	Belarusian	-
`Language.BG`	Bulgarian	-
`Language.BN`	Bengali	-
`Language.YUE`	Cantonese	-
`Language.CA`	Catalan	-
`Language.HR`	Croatian	-
`Language.CS`	Czech	-
`Language.DA`	Danish	-
`Language.NL`	Dutch	-
`Language.EN`	English	`en-US`, `en-GB`, `en-AU`
`Language.EO`	Esperanto	-
`Language.ET`	Estonian	-
`Language.FA`	Persian	-
`Language.FI`	Finnish	-
`Language.FR`	French	-
`Language.GL`	Galician	-
`Language.DE`	German	-
`Language.EL`	Greek	-
`Language.HE`	Hebrew	-
`Language.HI`	Hindi	-
`Language.HU`	Hungarian	-
`Language.IA`	Interlingua	-
`Language.IT`	Italian	-
`Language.ID`	Indonesian	-
`Language.GA`	Irish	-
`Language.JA`	Japanese	-
`Language.KO`	Korean	-
`Language.LV`	Latvian	-
`Language.LT`	Lithuanian	-
`Language.MS`	Malay	-
`Language.MT`	Maltese	-
`Language.CMN`	Mandarin	`cmn-Hans`, `cmn-Hant`
`Language.MR`	Marathi	-
`Language.MN`	Mongolian	-
`Language.NO`	Norwegian	-
`Language.PL`	Polish	-
`Language.PT`	Portuguese	-
`Language.RO`	Romanian	-
`Language.RU`	Russian	-
`Language.SK`	Slovakian	-
`Language.SL`	Slovenian	-
`Language.ES`	Spanish	-
`Language.SV`	Swedish	-
`Language.SW`	Swahili	-
`Language.TA`	Tamil	-
`Language.TH`	Thai	-
`Language.TR`	Turkish	-
`Language.UG`	Uyghur	-
`Language.UK`	Ukrainian	-
`Language.UR`	Urdu	-
`Language.VI`	Vietnamese	-
`Language.CY`	Welsh	-

For bilingual transcription, use the language_code and domain parameters as follows:

Language Code	Description	Domain Options
`cmn_en`	English / Mandarin	-
`en_ms`	English / Malay	-
`Language.ES`	English / Spanish	`bilingual-en`
`en_ta`	English / Tamil	-

Speaker Diarization

Speechmatics STT supports speaker diarization, which separates out different speakers in the audio. The identity of each speaker is returned in the TranscriptionFrame objects in the user_id attribute. To enable this feature, set enable_diarization to True. Additionally, if speaker_active_format or speaker_passive_format are provided, then the text output for the TranscriptionFrame will be formatted to this specification. Your system context can then be updated to include information about this format to understand which speaker spoke which words. The passive format is optional and is used when the engine has been told to focus on specific speakers and other speakers will then be formatted using the speaker_passive_format format.

speaker_active_format -> the formatter for active speakers
speaker_passive_format -> the formatter for passive / background speakers

Examples:

<{speaker_id}>{text}</{speaker_id}> -> <S1>Good morning.</S1>.
@{speaker_id}: {text} -> @S1: Good morning..

Available attributes

Attribute	Description	Example
`speaker_id`	The ID of the speaker	`S1`
`text`	The transcribed text	`Good morning.`

Usage Examples

Examples are included in the Pipecat project:

Using Speechmatics STT service -> 07a-interruptible-speechmatics.py
Using Speechmatics STT service with VAD -> 07a-interruptible-speechmatics-vad.py
Transcribing with Speechmatics STT -> 13h-speechmatics-transcription.py

Sample projects:

Guess Who -> Guess Who
Guess Who Board Game -> Guess Who

Basic Configuration

Initialize the SpeechmaticsSTTService and use it in a pipeline:

from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.transcriptions.language import Language

# Configure service
stt = SpeechmaticsSTTService(
    api_key="your-api-key",
    params=SpeechmaticsSTTService.InputParams(
        language=Language.FR,
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

With Diarization

This will enable diarization and also only go to the LLM if words are spoken from the first speaker (S1). Words from other speakers are transcribed but only sent when the first speaker speaks. When using the enable_vad option, this will use the speaker diarization to determine when a speaker is speaking. You will need to disable VAD options within the selected transport object to ensure this works correctly (see 07b-interruptible-speechmatics-vad.py as an example). Initialize the SpeechmaticsSTTService and use it in a pipeline:

from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.transcriptions.language import Language

# Configure service
stt = SpeechmaticsSTTService(
    api_key="your-api-key",
    params=SpeechmaticsSTTService.InputParams(
        language=Language.EN,
        enable_diarization=True,
        enable_vad=True,
        speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
        speaker_passive_format="<PASSIVE><{speaker_id}>{text}</{speaker_id}></PASSIVE>",
        focus_speakers=["S1"],
    )
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Additional Notes

Connection Management: Automatically handles WebSocket connections and reconnections
Sample Rate: The default sample rate of 16000 in pcm_s16le format
VAD Integration: Optionally supports Speechmatics’ built-in VAD and end of utterance detection

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

Speechmatics Docs

Speaker Diarization

Example Code

Installation

Frames

Input

Output

Endpoints

Feature Discovery

Language Support

Speaker Diarization

Available attributes

Usage Examples

Basic Configuration

With Diarization

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

Speechmatics Docs

Speaker Diarization

Example Code

​Installation

​Frames

​Input

​Output

​Endpoints

​Feature Discovery

​Language Support

​Speaker Diarization

​Available attributes

​Usage Examples

​Basic Configuration

​With Diarization

​Additional Notes

Overview

Installation

Frames

Input

Output

Endpoints

Feature Discovery

Language Support

Speaker Diarization

Available attributes

Usage Examples

Basic Configuration

With Diarization

Additional Notes