LLMTextFrame
s, which are used by subsequent processors to create audio output for the bot.
Pipeline Placement
The LLM instance should be placed after the user context aggregator and before any downstream services that depend on the LLM’s output stream:- Input: Receives
OpenAILLMContextFrame
containing conversation history - Processing:
- Analyzes context and generates streaming response
- Handles function calls if tools are available
- Tracks token usage for metrics
- Output:
- Denotes the start of the streaming response by pushing an
LLMFullResponseStartFrame
- Streams
LLMTextFrame
s containing response tokens to downstream processors (enables real-time TTS processing) - Ends with an
LLMFullResponseEndFrame
to mark the completion of the response
- Denotes the start of the streaming response by pushing an
- Function calls:
FunctionCallsStartedFrame
: Indicates function execution beginningFunctionCallInProgressFrame
: Indicates a function is currently executingFunctionCallResultFrame
: Contains results from executed functions
Supported LLM Services
Pipecat supports a wide range of LLM providers to fit different needs, performance requirements, and budgets:Text-Based LLMs
Most LLM services are built on the OpenAI chat completion specification for compatibility:OpenAI
GPT models with the original chat completion API
Anthropic
Claude models with advanced reasoning capabilities
Google Gemini
Multimodal capabilities with competitive performance
AWS Bedrock
Enterprise-grade hosting for various foundation models
base_url
parameter.
Speech-to-Speech Models
For lower latency, some providers offer direct speech-to-speech models:- OpenAI Realtime: Direct speech input/output with GPT models
- Gemini Live: Real-time speech conversations with Gemini
- AWS Nova Sonic: Speech-optimized models on Bedrock
Supported LLM Services
View the complete list of supported language model providers
LLM Service Architecture
BaseOpenAILLMService
Many LLM services use the OpenAI chat completion specification. Pipecat provides aBaseOpenAILLMService
that most providers extend, enabling easy switching between compatible services:
LLM Configuration
Service-Specific Configuration
Each LLM service has its own configuration options. For example, configuring OpenAI with various parameters:Individual LLM Services
Explore configuration options for each supported LLM provider
Base Class Configuration
All LLM services inherit from theLLMService
base class with shared configuration options:
run_in_parallel
: Controls whether function calls execute simultaneously or sequentiallyTrue
(default): Faster execution when multiple functions are calledFalse
: Sequential execution for dependent function calls
Event Handlers
LLM services provide event handlers for monitoring completion lifecycle:on_completion_timeout
: Triggered when LLM requests timeouton_function_calls_started
: Triggered when function calls are initiated
Function Calling
LLMs can call external functions to access real-time data and perform actions beyond their training data. This enables capabilities like checking weather, querying databases, or controlling external APIs. Function calls and their results are automatically stored in the conversation context by the context aggregator.Function Calling
Learn how to enable LLMs to interact with external services and APIs
Key Takeaways
- Pipeline placement matters - LLM goes after user context, before TTS
- Token streaming enables real-time responses - no waiting for complete generation
- OpenAI compatibility enables easy provider switching
- Function calling extends capabilities beyond training data
- Configuration affects behavior - tune temperature, penalties, and limits
- Services are modular - swap providers without changing pipeline code
What’s Next
Now that you understand LLM configuration, let’s explore how function calling enables your bot to interact with external services and real-time data.Function Calling
Learn how to enable LLMs to interact with external services and APIs