LLM Inference

LLM services are responsible for chat completions and tool calling based on the provided context (conversation history). The LLM responds by streaming tokens via LLMTextFrames, which are used by subsequent processors to create audio output for the bot.

Pipeline Placement

The LLM instance should be placed after the user context aggregator and before any downstream services that depend on the LLM’s output stream:

pipeline = Pipeline([
    transport.input(),
    stt,                           # Creates TranscriptionFrames
    context_aggregator.user(),     # Processes user context → creates OpenAILLMContextFrame
    llm,                           # Processes context → streams LLMTextFrames
    tts,                           # Processes LLMTextFrames → creates TTSAudioRawFrames
    transport.output(),
    context_aggregator.assistant(),
])

Frame flow:

Input: Receives OpenAILLMContextFrame containing conversation history
Processing:
- Analyzes context and generates streaming response
- Handles function calls if tools are available
- Tracks token usage for metrics
Output:
- Denotes the start of the streaming response by pushing an LLMFullResponseStartFrame
- Streams LLMTextFrames containing response tokens to downstream processors (enables real-time TTS processing)
- Ends with an LLMFullResponseEndFrame to mark the completion of the response
Function calls:
- FunctionCallsStartedFrame: Indicates function execution beginning
- FunctionCallInProgressFrame: Indicates a function is currently executing
- FunctionCallResultFrame: Contains results from executed functions

Supported LLM Services

Pipecat supports a wide range of LLM providers to fit different needs, performance requirements, and budgets:

Text-Based LLMs

Most LLM services are built on the OpenAI chat completion specification for compatibility:

OpenAI

GPT models with the original chat completion API

Anthropic

Claude models with advanced reasoning capabilities

Google Gemini

Multimodal capabilities with competitive performance

AWS Bedrock

Enterprise-grade hosting for various foundation models

Compatible APIs: Any OpenAI-spec compatible service can be used via the base_url parameter.

Speech-to-Speech Models

For lower latency, some providers offer direct speech-to-speech models:

OpenAI Realtime: Direct speech input/output with GPT models
Gemini Live: Real-time speech conversations with Gemini
AWS Nova Sonic: Speech-optimized models on Bedrock

For a complete list of supported LLM services, see the Supported Services page:

Supported LLM Services

View the complete list of supported language model providers

LLM Service Architecture

BaseOpenAILLMService

Many LLM services use the OpenAI chat completion specification. Pipecat provides a BaseOpenAILLMService that most providers extend, enabling easy switching between compatible services:

from pipecat.services.openai.llm import OpenAILLMService

# Native OpenAI
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))

# OpenAI-compatible service via base_url
llm = OpenAILLMService(
    api_key=os.getenv("OTHER_API_KEY"),
    base_url="https://api.other-provider.com/v1"  # Custom endpoint
)

This architecture allows you to quickly plug in different LLM services without changing your pipeline code.

LLM Configuration

Service-Specific Configuration

Each LLM service has its own configuration options. For example, configuring OpenAI with various parameters:

from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService

llm = OpenAILLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4",
    params=BaseOpenAILLMService.InputParams(
        temperature=0.7,            # Response creativity (0.0-2.0)
        max_completion_tokens=150,  # Maximum response length
        frequency_penalty=0.5,      # Reduce repetition (0.0-2.0)
        presence_penalty=0.5,       # Encourage topic diversity (0.0-2.0)
    ),
)

For detailed configuration options specific to each provider:

Individual LLM Services

Explore configuration options for each supported LLM provider

Base Class Configuration

All LLM services inherit from the LLMService base class with shared configuration options:

llm = YourLLMService(
    # Service-specific options...
    run_in_parallel=True,  # Whether function calls run in parallel (default: True)
)

Key options:

run_in_parallel: Controls whether function calls execute simultaneously or sequentially
- True (default): Faster execution when multiple functions are called
- False: Sequential execution for dependent function calls

Event Handlers

LLM services provide event handlers for monitoring completion lifecycle:

@llm.event_handler("on_completion_timeout")
async def on_completion_timeout(service):
    logger.warning("LLM completion timed out")
    # Handle timeout (retry, fallback, etc.)

@llm.event_handler("on_function_calls_started")
async def on_function_calls_started(service, function_calls):
    logger.info(f"Starting {len(function_calls)} function calls")
    # Optionally notify user that bot is "thinking"
    await tts.queue_frame(TTSSpeakFrame("Let me check on that."))

Available events:

on_completion_timeout: Triggered when LLM requests timeout
on_function_calls_started: Triggered when function calls are initiated

These handlers enable you to provide user feedback and implement error recovery strategies.

Function Calling

LLMs can call external functions to access real-time data and perform actions beyond their training data. This enables capabilities like checking weather, querying databases, or controlling external APIs. Function calls and their results are automatically stored in the conversation context by the context aggregator.

Function Calling

Learn how to enable LLMs to interact with external services and APIs

Key Takeaways

Pipeline placement matters - LLM goes after user context, before TTS
Token streaming enables real-time responses - no waiting for complete generation
OpenAI compatibility enables easy provider switching
Function calling extends capabilities beyond training data
Configuration affects behavior - tune temperature, penalties, and limits
Services are modular - swap providers without changing pipeline code

What’s Next

Now that you understand LLM configuration, let’s explore how function calling enables your bot to interact with external services and real-time data.

Function Calling

Learn how to enable LLMs to interact with external services and APIs

Learning Pipecat

Fundamentals

Features

Telephony

Pipeline Placement

Supported LLM Services

Text-Based LLMs

OpenAI

Anthropic

Google Gemini

AWS Bedrock

Speech-to-Speech Models

Supported LLM Services

LLM Service Architecture

BaseOpenAILLMService

LLM Configuration

Service-Specific Configuration

Individual LLM Services

Base Class Configuration

Event Handlers

Function Calling

Function Calling

Key Takeaways

What’s Next

Function Calling

Learning Pipecat

Fundamentals

Features

Telephony

​Pipeline Placement

​Supported LLM Services

​Text-Based LLMs

OpenAI

Anthropic

Google Gemini

AWS Bedrock

​Speech-to-Speech Models

Supported LLM Services

​LLM Service Architecture

​BaseOpenAILLMService

​LLM Configuration

​Service-Specific Configuration

Individual LLM Services

​Base Class Configuration

​Event Handlers

​Function Calling

Function Calling

​Key Takeaways

​What’s Next

Function Calling

Pipeline Placement

Supported LLM Services

Text-Based LLMs

Speech-to-Speech Models

LLM Service Architecture

BaseOpenAILLMService

LLM Configuration

Service-Specific Configuration

Base Class Configuration

Event Handlers

Function Calling

Key Takeaways

What’s Next