Install the Cerebrium CLI
To get started, let us run the following commands:- Run
pip install cerebrium
to install the Python package. - Run
cerebrium login
to authenticate yourself.
Create a Cerebrium project
- Create a new Cerebrium project:
- This will create two key files:
main.py
- Your application entrypointcerebrium.toml
- Configuration for build and environment settings
cerebrium.toml
with the necessary configuration:
OPENAI_API_KEY
- We use OpenAI For the LLM. You can get your API key from hereDAILY_TOKEN
- For WebRTC communication. You can get your token from hereCARTERSIA_API_KEY
- For text-to-speech services. You can get your API key from here
Agent setup
We create a basic pipeline setup in ourmain.py
that combines our LLM, TTS and Daily WebRTC transport layer.
Deploy bot
Deploy your application to Cerebrium:POST \<BASE_URL\>/main
that you can call with your room_url and token. Let us test it.
Test it out
Future Considerations
Since Cerebrium supports both CPU and GPU workloads if you would like to lower the latency of your application then the best would be to get model weights from various providers and run them locally. You can do this for:- LLM: Run any OpenSource model using a framework such as vLLM
- TTS: Both PlayHt and Deepgram offer TTS models that can be run locally
- STT: Deepgram offers a local model that can be run locally
Examples
- Fastest voice agent: Local only implementation
- RAG voice agent: Create a voice agent that can do RAG using Cerebrium + OpenAI + Pinecone
- Twilio voice agent: Create a voice agent that can receive phone calls via Twilio
- OpenAI Realtime API implementation: Create a voice agent that can receive phone calls via OpenAI Realtime API