| | --- |
| | language: |
| | - "en" |
| | tags: |
| | - video |
| | license: apache-2.0 |
| | pipeline_tag: text-to-video |
| | library_name: diffusers |
| | --- |
| | |
| | <p align="center"> |
| | <img src="assets/logo.jpg" height=30> |
| | </p> |
| |
|
| | # FastMochi Model Card |
| |
|
| | ## Model Details |
| |
|
| | <div align="center"> |
| | <table style="margin-left: auto; margin-right: auto; border: none;"> |
| | <tr> |
| | <td> |
| | <img src="assets/mochi-demo.gif" width="640" alt="Mochi Demo"> |
| | </td> |
| | </tr> |
| | <tr> |
| | <td style="text-align:center;"> |
| | Get 8X diffusion boost for Mochi with FastVideo |
| | </td> |
| | </tr> |
| | </table> |
| | </div> |
| | |
| | FastMochi is an accelerated [Mochi](https://huggingface.co/genmo/mochi-1-preview) model. It can sample high quality videos with 8 diffusion steps. That brings around 8X speed up compared to the original Mochu with 64 steps. |
| |
|
| | - **Developed by**: [Hao AI Lab](https://hao-ai-lab.github.io/) |
| | - **License**: Apache-2.0 |
| | - **Distilled from**: [Mochi](https://huggingface.co/genmo/mochi-1-preview) |
| | - **Github Repository**: https://github.com/hao-ai-lab/FastVideo |
| |
|
| | ## Usage |
| |
|
| | - Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README. |
| | - You can also run FastMochi using the official [Mochi repository](https://github.com/Tencent/HunyuanVideo) with the script below and this [compatible weight](https://huggingface.co/FastVideo/FastMochi). |
| |
|
| | <details> |
| | <summary>Code</summary> |
| |
|
| | ```python |
| | from genmo.mochi_preview.pipelines import ( |
| | DecoderModelFactory, |
| | DitModelFactory, |
| | MochiMultiGPUPipeline, |
| | T5ModelFactory, |
| | linear_quadratic_schedule, |
| | ) |
| | from genmo.lib.utils import save_video |
| | import os |
| | |
| | with open("prompt.txt", "r") as f: |
| | prompts = [line.rstrip() for line in f] |
| | |
| | pipeline = MochiMultiGPUPipeline( |
| | text_encoder_factory=T5ModelFactory(), |
| | world_size=4, |
| | dit_factory=DitModelFactory( |
| | model_path=f"weights/dit.safetensors", model_dtype="bf16" |
| | ), |
| | decoder_factory=DecoderModelFactory( |
| | model_path=f"weights/decoder.safetensors", |
| | ), |
| | ) |
| | # read prompt line by line from prompt.txt |
| | |
| | |
| | output_dir = "outputs" |
| | os.makedirs(output_dir, exist_ok=True) |
| | for i, prompt in enumerate(prompts): |
| | video = pipeline( |
| | height=480, |
| | width=848, |
| | num_frames=163, |
| | num_inference_steps=8, |
| | sigma_schedule=linear_quadratic_schedule(8, 0.1, 6), |
| | cfg_schedule=[1.5] * 8, |
| | batch_cfg=False, |
| | prompt=prompt, |
| | negative_prompt="", |
| | seed=12345, |
| | )[0] |
| | save_video(video, f"{output_dir}/output_{i}.mp4") |
| | ``` |
| |
|
| | </details> |
| |
|
| |
|
| | ## Training details |
| |
|
| | FastMochi is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters: |
| | - Batch size: 32 |
| | - Resulotion: 480X848 |
| | - Num of frames: 169 |
| | - Train steps: 128 |
| | - GPUs: 16 |
| | - LR: 1e-6 |
| | - Loss: huber |
| |
|
| | ## Evaluation |
| | We provide some qualitative comparisons between FastMochi 8 step inference v.s. the original Mochi with 8 step inference: |
| |
|
| |
|
| | | FastMochi 6 steps | Mochi 6 steps | |
| | | --- | --- | |
| | |  |  | |
| | |  |  | |
| | |  |  | |
| | |  |  | |
| |
|
| |
|