What is Context in Pipecat?
In Pipecat, context refers to the conversation history that the LLM uses to generate responses. The context consists of a list of alternating user/assistant messages that represents the collective history of the entire conversation.How Context Updates During Conversations
Context updates happen automatically as frames flow through your pipeline: User Messages:- User speaks →
InputAudioRawFrame
→ STT Service →TranscriptionFrame
context_aggregator.user()
receivesTranscriptionFrame
and adds user message to context
- LLM generates response →
LLMTextFrame
→ TTS Service →TTSTextFrame
context_aggregator.assistant()
receivesTTSTextFrame
and adds assistant message to context
TranscriptionFrame
: Contains user speech converted to text by STT serviceLLMTextFrame
: Contains LLM-generated responsesTTSTextFrame
: Contains bot responses converted to text by TTS service (represents what was actually spoken)
The TTS service processes
LLMTextFrame
s but outputs TTSTextFrame
s, which
represent the actual spoken text returned by the TTS provider. This ensures
context matches what users actually hear.Setting Up Context Management
Pipecat includes a context aggregator that creates and manages context for both user and assistant messages:1. Create the Context and Context Aggregator
2. Context with Function Calling
Context can also include tools (function definitions) that the LLM can call during conversations:We’ll cover function calling in detail in an upcoming section. The context
aggregator handles function call storage automatically.
3. Add Context Aggregators to Your Pipeline
Context Aggregator Placement
The placement of context aggregator instances in your pipeline is crucial for proper operation:User Context Aggregator
Place the user context aggregator downstream from the STT service. Since the user’s speech results inTranscriptionFrame
objects pushed by the STT service, the user aggregator needs to be positioned to collect these frames.
Assistant Context Aggregator
Place the assistant context aggregator aftertransport.output()
. This positioning is important because:
- The TTS service outputs
TTSTextFrame
s in addition to audio - The assistant aggregator must be downstream to collect those frames
- It ensures context updates happen word-by-word for specific services (e.g. Cartesia, ElevenLabs, and Rime)
- Your context stays updated at the word level in case an interruption occurs
Always place the assistant context aggregator after
transport.output()
to ensure proper word-level context updates during interruptions.Manual Context Control
You can programmatically add new messages to the context by pushing or queueing specific frames:Adding Messages
LLMMessagesAppendFrame
: Appends a new message to the existing contextLLMMessagesUpdateFrame
: Completely replaces the existing context with new messages
Retrieving Current Context
The context aggregator provides aget_context_frame()
method to obtain the latest context:
Triggering Bot Responses
You’ll commonly use this manual mechanism—obtaining the current context and pushing/queueing it—to trigger the bot to speak in two scenarios:- Starting a pipeline where the bot should speak first
- After pushing new context frames using
LLMMessagesAppendFrame
orLLMMessagesUpdateFrame
Key Takeaways
- Context is conversation history - automatically maintained as users and bots exchange messages
- Frame types matter -
TranscriptionFrame
for users,TTSTextFrame
for assistants - Placement matters - user aggregator after STT, assistant aggregator after transport output
- Tools are included - function definitions and results are stored in context
- Manual control available - use frames to append messages or trigger responses when needed
- Word-level precision - proper placement ensures context accuracy during interruptions
What’s Next
Now that you understand context management, let’s explore how to configure the LLM services that process this context to generate intelligent responses.LLM Inference
Learn how to configure language models in your voice AI pipeline