Overview
SpeechmaticsSTTService
enables real-time speech transcription using Speechmatics’ WebSocket API with partial + final results, speaker diarization, and end of utterance detection (VAD).
API Reference
Complete API documentation and method details
Speechmatics Docs
Official Speechmatics documentation and features
Speaker Diarization
Separating out different speakers in the audio
Example Code
Working example with interruption handling
Installation
To useSpeechmaticsSTTService
, install the required dependencies:
SPEECHMATICS_API_KEY
.
Get your API key from the Speechmatics
Portal.
Frames
Input
InputAudioRawFrame
- Raw PCM audio data (16-bit, 16kHz, mono)
Output
InterimTranscriptionFrame
- Real-time transcription updatesTranscriptionFrame
- Final transcription results
Endpoints
Speechmatics STT supports the following endpoints (defaults toEU2
):
Region | Environment | STT Endpoint | Access |
---|---|---|---|
EU | EU1 | wss://neu.rt.speechmatics.com/ | Self-Service / Enterprise |
EU | EU2 (Default) | wss://eu2.rt.speechmatics.com/ | Self-Service / Enterprise |
US | US1 | wss://wus.rt.speechmatics.com/ | Enterprise |
Feature Discovery
To check the languages and features supported by Speechmatics STT, you can use the following code:Language Support
Refer to the Speechmatics
docs for more
information on supported languages.
language
parameter when creating the STT object. The exception to this is English / Mandarin which has the code cmn_en
and must be set using the language_code
parameter.
Language Code | Description | Locales |
---|---|---|
Language.AR | Arabic | - |
Language.BA | Bashkir | - |
Language.EU | Basque | - |
Language.BE | Belarusian | - |
Language.BG | Bulgarian | - |
Language.BN | Bengali | - |
Language.YUE | Cantonese | - |
Language.CA | Catalan | - |
Language.HR | Croatian | - |
Language.CS | Czech | - |
Language.DA | Danish | - |
Language.NL | Dutch | - |
Language.EN | English | en-US , en-GB , en-AU |
Language.EO | Esperanto | - |
Language.ET | Estonian | - |
Language.FA | Persian | - |
Language.FI | Finnish | - |
Language.FR | French | - |
Language.GL | Galician | - |
Language.DE | German | - |
Language.EL | Greek | - |
Language.HE | Hebrew | - |
Language.HI | Hindi | - |
Language.HU | Hungarian | - |
Language.IA | Interlingua | - |
Language.IT | Italian | - |
Language.ID | Indonesian | - |
Language.GA | Irish | - |
Language.JA | Japanese | - |
Language.KO | Korean | - |
Language.LV | Latvian | - |
Language.LT | Lithuanian | - |
Language.MS | Malay | - |
Language.MT | Maltese | - |
Language.CMN | Mandarin | cmn-Hans , cmn-Hant |
Language.MR | Marathi | - |
Language.MN | Mongolian | - |
Language.NO | Norwegian | - |
Language.PL | Polish | - |
Language.PT | Portuguese | - |
Language.RO | Romanian | - |
Language.RU | Russian | - |
Language.SK | Slovakian | - |
Language.SL | Slovenian | - |
Language.ES | Spanish | - |
Language.SV | Swedish | - |
Language.SW | Swahili | - |
Language.TA | Tamil | - |
Language.TH | Thai | - |
Language.TR | Turkish | - |
Language.UG | Uyghur | - |
Language.UK | Ukrainian | - |
Language.UR | Urdu | - |
Language.VI | Vietnamese | - |
Language.CY | Welsh | - |
language_code
and domain
parameters as follows:
Language Code | Description | Domain Options |
---|---|---|
cmn_en | English / Mandarin | - |
en_ms | English / Malay | - |
Language.ES | English / Spanish | bilingual-en |
en_ta | English / Tamil | - |
Speaker Diarization
Speechmatics STT supports speaker diarization, which separates out different speakers in the audio. The identity of each speaker is returned in the TranscriptionFrame objects in theuser_id
attribute.
To enable this feature, set enable_diarization
to True
. Additionally, if speaker_active_format
or speaker_passive_format
are provided, then the text output for the TranscriptionFrame will be formatted to this specification. Your system context can then be updated to include information about this format to understand which speaker spoke which words. The passive format is optional and is used when the engine has been told to focus on specific speakers and other speakers will then be formatted using the speaker_passive_format
format.
speaker_active_format
-> the formatter for active speakersspeaker_passive_format
-> the formatter for passive / background speakers
<{speaker_id}>{text}</{speaker_id}>
-><S1>Good morning.</S1>
.@{speaker_id}: {text}
->@S1: Good morning.
.
Available attributes
Attribute | Description | Example |
---|---|---|
speaker_id | The ID of the speaker | S1 |
text | The transcribed text | Good morning. |
Usage Examples
Examples are included in the Pipecat project:- Using Speechmatics STT service -> 07a-interruptible-speechmatics.py
- Using Speechmatics STT service with VAD -> 07a-interruptible-speechmatics-vad.py
- Transcribing with Speechmatics STT -> 13h-speechmatics-transcription.py
Basic Configuration
Initialize theSpeechmaticsSTTService
and use it in a pipeline:
With Diarization
This will enable diarization and also only go to the LLM if words are spoken from the first speaker (S1
). Words from other speakers are transcribed but only sent when the first speaker speaks. When using the enable_vad
option, this will use the speaker diarization to determine when a speaker is speaking. You will need to disable VAD options within the selected transport object to ensure this works correctly (see 07b-interruptible-speechmatics-vad.py as an example).
Initialize the SpeechmaticsSTTService
and use it in a pipeline:
Additional Notes
- Connection Management: Automatically handles WebSocket connections and reconnections
- Sample Rate: The default sample rate of
16000
inpcm_s16le
format - VAD Integration: Optionally supports Speechmatics’ built-in VAD and end of utterance detection