How to deploy LiveKit Agents on Modal

NOTE: We are currently updating our LiveKit example so it’s up to date with the latest release. Check back soon.

If you are looking to build a real-time voice or video application, you can’t just use HTTP. It’s too slow. Traditional HTTP is request-response based, creating overhead for each interaction. Establishing new TCP connections and handshaking also creates additional latency.

Instead, you should be using technologies like WebRTC. WebRTC is purpose-built for peer-to-peer audio/video streaming and data sharing without requiring plugins or additional software.

But WebRTC is complex. It’s not easy to get right. You often have to write thousands of lines of boilerplate code to handle connections, signalling, media capture, peer connections, ICE candidates, STUN/TURN servers etc.

That’s why LiveKit has become so popular. LiveKit is an open-source library that abstracts away the complexity of working with WebRTC. Rather than having to deal with all the boilerplate yourself, you just use LiveKit’s SDK.

LiveKit Agents

Recently, LiveKit has launched a framework for building real-time voice assistants, called LiveKit Agents.

It allows you to define an AI agent that will join as a participant in a LiveKit room.

LiveKit Agent Lifecycle

Here’s a high-level overview of the agent lifecycle:

Worker registration: Your agent connects to the LiveKit server, registering as a “worker” via a WebSocket.
Agent dispatch: When a user connects to a room, the LiveKit server selects an available worker, which then instantiates your program and joins the room. A worker can run multiple agent instances in separate processes.
Your program: Here, you utilize the LiveKit Python SDK and can leverage plugins for processing voice and video data.
Room close: The room closes automatically when the last non-agent participant leaves, and then disconnects remaining agents.

You can also deploy LiveKit Agents on Render, Kubernetes, and other cloud providers, but we think that Modal is the best option. Modal is a serverless cloud platform and Python library. With Modal, you can write a Python function, add a Modal decorator, and deploy your application in a container in the cloud in seconds.

✅ No Infrastructure Management

Modal removes the complexity of managing Kubernetes clusters or provisioning cloud instances. Your LiveKit agents run in a fully managed environment with zero operational overhead.

✅ Automatic Scaling

With Modal, you can scale your LiveKit workloads dynamically based on demand. Modal’s serverless execution model ensures you only pay for what you use.

✅ Optimized GPU Execution

If your agent needs to run deep learning models, Modal supports running your workloads on GPUs like NVIDIA H100s.

Conclusion

LiveKit Agents allows developers to build real-time voice assistants with minimal effort.

And the best way to deploy is with Modal!