Google Gemini

Overview

GoogleLLMService provides integration with Google’s Gemini models, supporting streaming responses, function calling, and multimodal inputs. It includes specialized context handling for Google’s message format while maintaining compatibility with OpenAI-style contexts.

API Reference

Complete API documentation and method details

Gemini Docs

Official Google Gemini API documentation and features

Example Code

Working example with function calling

Installation

To use GoogleLLMService, install the required dependencies:

pip install "pipecat-ai[google]"

You’ll also need to set up your Google API key as an environment variable: GOOGLE_API_KEY.

Get your API key from Google AI Studio.

Frames

Input

OpenAILLMContextFrame - Conversation context and history
LLMMessagesFrame - Direct message list
VisionImageRawFrame - Images for vision processing
LLMUpdateSettingsFrame - Runtime parameter updates

Output

LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries
LLMTextFrame - Streamed completion chunks
LLMSearchResponseFrame - Search grounding results with citations
FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle
ErrorFrame - API or processing errors

Search Grounding

Google Gemini’s search grounding feature enables real-time web search integration, allowing the model to access current information and provide citations. This is particularly valuable for applications requiring up-to-date information.

Enabling Search Grounding

# Configure search grounding tool
search_tool = {
    "google_search_retrieval": {
        "dynamic_retrieval_config": {
            "mode": "MODE_DYNAMIC",
            "dynamic_threshold": 0.3,  # Lower = more frequent grounding
        }
    }
}

# Initialize with search grounding
llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-1.5-flash-002",
    system_instruction="You are a helpful assistant with access to current information.",
    tools=[search_tool]
)

Handling Search Results

Search grounding produces LLMSearchResponseFrame with detailed citation information:

@pipeline.event_handler("llm_search_response")
async def handle_search_response(frame):
    print(f"Search result: {frame.search_result}")
    print(f"Sources: {len(frame.origins)} citations")
    for origin in frame.origins:
        print(f"- {origin['site_title']}: {origin['site_uri']}")

Function Calling

Function Calling Guide

Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications.

Context Management

Context Management Guide

Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences.

Usage Example

import os
from pipecat.services.google.llm import GoogleLLMService
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema

# Configure Gemini service with search grounding
search_tool = {
    "google_search_retrieval": {
        "dynamic_retrieval_config": {
            "mode": "MODE_DYNAMIC",
            "dynamic_threshold": 0.3
        }
    }
}

llm = GoogleLLMService(
    api_key=os.getenv("GOOGLE_API_KEY"),
    model="gemini-2.0-flash",
    system_instruction="""You are a helpful assistant with access to current information.
    When users ask about recent events, use search to provide accurate, up-to-date information.""",
    tools=[search_tool],
    params=GoogleLLMService.InputParams(
        temperature=0.7,
        max_tokens=1000
    )
)

# Define function for tool calling
weather_function = FunctionSchema(
    name="get_weather",
    description="Get current weather information",
    properties={
        "location": {
            "type": "string",
            "description": "City and state, e.g. San Francisco, CA"
        }
    },
    required=["location"]
)

# Define image capture function for multimodal capabilities
image_function = FunctionSchema(
    name="get_image",
    description="Capture and analyze an image from the video stream",
    properties={
        "question": {
            "type": "string",
            "description": "Question about what to analyze in the image"
        }
    },
    required=["question"]
)

tools = ToolsSchema(standard_tools=[weather_function, image_function])

# Create context with multimodal system prompt
context = OpenAILLMContext(
    messages=[
        {
            "role": "system",
            "content": """You are a helpful assistant with access to current information and vision capabilities.
            You can answer questions about weather, analyze images from video streams, and search for current information.
            Keep responses concise for voice output."""
        },
        {"role": "user", "content": "Hello! What can you help me with?"}
    ],
    tools=tools
)

# Create context aggregators
context_aggregator = llm.create_context_aggregator(context)

# Register function handlers
async def get_weather(params):
    location = params.arguments["location"]
    await params.result_callback(f"Weather in {location}: 72°F and sunny")

async def get_image(params):
    question = params.arguments["question"]
    # Request image from video stream
    await params.llm.request_image_frame(
        user_id=client_id,
        function_name=params.function_name,
        tool_call_id=params.tool_call_id,
        text_content=question
    )
    await params.result_callback(f"Analyzing image for: {question}")

llm.register_function("get_weather", get_weather)
llm.register_function("get_image", get_image)

# Optional: Add function call feedback
@llm.event_handler("on_function_calls_started")
async def on_function_calls_started(service, function_calls):
    await tts.queue_frame(TTSSpeakFrame("Let me check on that."))

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
    context_aggregator.assistant()
])

Metrics

Google Gemini provides comprehensive usage tracking:

Time to First Byte (TTFB) - Response latency measurement
Processing Duration - Total request processing time
Token Usage - Prompt tokens, completion tokens, and totals

Learn how to enable Metrics in your Pipeline.

Additional Notes

Multimodal Capabilities: Native support for text, images, audio, and video processing
Search Grounding: Real-time web search with automatic citation and source attribution
System Instructions: Handle system messages differently than OpenAI - set during initialization
Vision Functions: Built-in support for image capture and analysis from video streams

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

API Reference

Gemini Docs

Example Code

Installation

Frames

Input

Output

Search Grounding

Enabling Search Grounding

Handling Search Results

Function Calling

Function Calling Guide

Context Management

Context Management Guide

Usage Example

Metrics

Additional Notes

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

API Reference

Gemini Docs

Example Code

​Installation

​Frames

​Input

​Output

​Search Grounding

​Enabling Search Grounding

​Handling Search Results

​Function Calling

Function Calling Guide

​Context Management

Context Management Guide

​Usage Example

​Metrics

​Additional Notes

Overview

Installation

Frames

Input

Output

Search Grounding

Enabling Search Grounding

Handling Search Results

Function Calling

Context Management

Usage Example

Metrics

Additional Notes