Text-to-video generation with Mochi
This example demonstrates how to run the Mochi 1 video generation model by Genmo on Modal.
Here’s one that we generated, inspired by our logo:
Note that the Mochi model, at time of writing, requires several minutes on one H100 to produce a high-quality clip of even a few seconds. So a single video generation therefore costs about $0.33 at our ~$5/hr rate for H100s.
Keep your eyes peeled for improved efficiency as the open source community works on this new model. We welcome PRs to improve the performance of this example!
Setting up the environment for Mochi
At the time of writing, Mochi is supported natively in the diffusers library,
but only in a pre-release version.
So we’ll need to install diffusers and transformers from GitHub.
Saving outputs
On Modal, we save large or expensive-to-compute data to distributed Volumes
We’ll use this for saving our Mochi weights, as well as our video outputs.
Downloading the model
We download the model weights into Volume cache to speed up cold starts. For more on storing model weights on Modal, see this guide.
This download takes five minutes or more, depending on traffic and network speed.
If you want to launch the download first, before running the rest of the code, use the following command from the folder containing this file:
The --detach flag ensures the download will continue
even if you close your terminal or shut down your computer
while it’s running.
Setting up our Mochi class
We’ll use the @cls decorator to define a Modal Class which we use to control the lifecycle of our cloud container.
We configure it to use our image, the distributed volume, and a single H100 GPU.
Running Mochi inference
We can trigger Mochi inference from our local machine by running the code in the local entrypoint below.
It ensures the model is downloaded to a remote volume, spins up a new replica to generate a video, also saved remotely, and then downloads the video to the local machine.
You can trigger it with:
Optional command line flags can be viewed with:
Using these flags, you can tweak your generation from the command line:
Addenda
The remainder of the code in this file is utility code.