Introduction
Modal is an AI infrastructure platform that lets you:
- Run low latency inference with sub-second cold starts, using open weights or custom models
- Scale out batch jobs to run massively in parallel
- Train or fine-tune open weights or custom models on the latest GPUs
- Spin up thousands of isolated and secure Sandboxes to execute AI generated code
- Launch GPU-backed Notebooks in seconds and collaborate with your colleagues in real-time
You get full serverless execution and pricing because we host everything and charge per second of usage.
Notably, there’s zero configuration in Modal - everything, including container environments and GPU specification, is code. Take a breath of fresh air and feel how good it tastes with no YAML in it.
Here’s a complete, minimal example of LLM inference running on Modal:
from pathlib import Path
import modal
app = modal.App("example-inference")
image = modal.Image.debian_slim().uv_pip_install("transformers[torch]")
@app.function(gpu="h100", image=image)
def chat(prompt: str | None = None) -> list[dict]:
from transformers import pipeline
if prompt is None:
prompt = f"/no_think Read this code.\n\n{Path(__file__).read_text()}\nIn one paragraph, what does the code do?"
print(prompt)
context = [{"role": "user", "content": prompt}]
chatbot = pipeline(
model="Qwen/Qwen3-1.7B-FP8", device_map="cuda", max_new_tokens=1024
)
result = chatbot(context)
print(result[0]["generated_text"][-1]["content"])
return resultThat’s it! You can copy and paste that text into a Python file in your favorite editor and then run it with modal run path/to/file.py.
How does it work?
Modal takes your code, puts it in a container, and executes it in the cloud. If you get a lot of traffic, Modal automatically scales up the number of containers as needed. This means you don’t need to mess with Kubernetes, Docker, or even an AWS account.
We pool capacity over all major clouds. That means we can optimize for both high GPU availability and low cost by dynamically deciding where to run your code based on the best available capacity.
Programming language support
Python is the primary language for building Modal applications and implementing Modal Functions, but you can also use JavaScript/TypeScript or Go to call Modal Functions, run Sandboxes, and manage Modal resources.
Getting started
Developing with Modal is easy because you don’t have to set up any infrastructure. Just:
- Create an account at modal.com
- Run
pip install modalto install themodalPython package - Run
modal setupto authenticate (if this doesn’t work, trypython -m modal setup)
…and you can start running jobs right away. Check out some of our simple getting started examples:
And when you’re ready for something fancier, explore our full library of examples, like:
- Running your own LLM inference
- Transcribing speech in real time with Kyutai STT
- Fine-tuning Flux
- Building a coding agent with Modal Sandboxes and LangGraph
- Training a small language model from scratch
- Parallel processing of Parquet files on S3
- Parsing documents with dots.ocr in a Modal Notebook
You can also learn Modal interactively without installing anything through our code playground.