Overview
NimLLMService
provides access to NVIDIA’s NIM language models through an OpenAI-compatible interface. It inherits from OpenAILLMService
and supports streaming responses, function calling, and context management, with special handling for NVIDIA’s incremental token reporting.
API Reference
Complete API documentation and method details
NVIDIA NIM Docs
Official NVIDIA NIM documentation and setup
Example Code
Working example with function calling
Installation
To use NVIDIA NIM services, install the required dependencies:NVIDIA_API_KEY
.
Get your API key from NVIDIA
Build.
Frames
Input
OpenAILLMContextFrame
- Conversation context and historyLLMMessagesFrame
- Direct message listVisionImageRawFrame
- Images for vision processingLLMUpdateSettingsFrame
- Runtime parameter updates
Output
LLMFullResponseStartFrame
/LLMFullResponseEndFrame
- Response boundariesLLMTextFrame
- Streamed completion chunksFunctionCallInProgressFrame
/FunctionCallResultFrame
- Function call lifecycleErrorFrame
- API or processing errors
Function Calling
Function Calling Guide
Learn how to implement function calling with standardized schemas, register
handlers, manage context properly, and control execution flow in your
conversational AI applications.
Context Management
Context Management Guide
Learn how to manage conversation context, handle message history, and
integrate context aggregators for consistent conversational experiences.
Usage Example
Metrics
Includes specialized token usage tracking for NIM’s incremental reporting:- Time to First Byte (TTFB) - Response latency measurement
- Processing Duration - Total request processing time
- Token Usage - Tracks tokens used per request, compatible with NIM’s incremental reporting
Learn how to enable Metrics in your Pipeline.
Additional Notes
- OpenAI Compatibility: Full compatibility with OpenAI API features and parameters
- NVIDIA Optimization: Hardware-accelerated inference on NVIDIA infrastructure
- Token Reporting: Custom handling for NIM’s incremental vs. OpenAI’s final token reporting
- Model Variety: Access to Nemotron and other NVIDIA-optimized model variants