Overview
CerebrasLLMService
provides access to Cerebras’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService
and supports streaming responses, function calling, and context management.
API Reference
Complete API documentation and method details
Cerebras Docs
Official Cerebras inference API documentation
Example Code
Working example with function calling
Installation
To use Cerebras services, install the required dependency:CEREBRAS_API_KEY
.
Get your API key from Cerebras Cloud.
Frames
Input
OpenAILLMContextFrame
- Conversation context and historyLLMMessagesFrame
- Direct message listVisionImageRawFrame
- Images for vision processingLLMUpdateSettingsFrame
- Runtime parameter updates
Output
LLMFullResponseStartFrame
/LLMFullResponseEndFrame
- Response boundariesLLMTextFrame
- Streamed completion chunksFunctionCallInProgressFrame
/FunctionCallResultFrame
- Function call lifecycleErrorFrame
- API or processing errors
Function Calling
Function Calling Guide
Learn how to implement function calling with standardized schemas, register
handlers, manage context properly, and control execution flow in your
conversational AI applications.
Context Management
Context Management Guide
Learn how to manage conversation context, handle message history, and
integrate context aggregators for consistent conversational experiences.
Usage Example
Metrics
Inherits all OpenAI-compatible metrics:- Time to First Byte (TTFB) - Ultra-low latency measurement
- Processing Duration - Total request processing time
- Token Usage - Prompt tokens, completion tokens, and totals
Learn how to enable Metrics in your Pipeline.
Additional Notes
- OpenAI Compatibility: Full compatibility with OpenAI API parameters and responses
- Streaming Responses: All responses are streamed for minimal latency
- Function Calling: Full support for OpenAI-style tool calling
- Open Source Models: Access to latest Llama models with commercial licensing