FastVideo
/

FastMochi-diffusers

Model card Files Files and versions

FastMochi-diffusers / README.md

PY007's picture

Update README.md

92d7fc4 verified about 1 year ago

|

history blame contribute delete

3.4 kB

	---
	language:
	- "en"
	tags:
	- video
	license: apache-2.0
	pipeline_tag: text-to-video
	library_name: diffusers
	---

	<p align="center">
	<img src="assets/logo.jpg" height=30>
	</p>

	# FastMochi Model Card

	## Model Details

	<div align="center">
	<table style="margin-left: auto; margin-right: auto; border: none;">
	<tr>
	<td>
	<img src="assets/mochi-demo.gif" width="640" alt="Mochi Demo">
	</td>
	</tr>
	<tr>
	<td style="text-align:center;">
	Get 8X diffusion boost for Mochi with FastVideo
	</td>
	</tr>
	</table>
	</div>

	FastMochi is an accelerated [Mochi](https://huggingface.co/genmo/mochi-1-preview) model. It can sample high quality videos with 8 diffusion steps. That brings around 8X speed up compared to the original Mochu with 64 steps.

	- Developed by: [Hao AI Lab](https://hao-ai-lab.github.io/)
	- License: Apache-2.0
	- Distilled from: [Mochi](https://huggingface.co/genmo/mochi-1-preview)
	- Github Repository: https://github.com/hao-ai-lab/FastVideo

	## Usage

	- Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README.
	- You can also run FastMochi using the official [Mochi repository](https://github.com/Tencent/HunyuanVideo) with the script below and this [compatible weight](https://huggingface.co/FastVideo/FastMochi).

	<details>
	<summary>Code</summary>

	```python
	from genmo.mochi_preview.pipelines import (
	DecoderModelFactory,
	DitModelFactory,
	MochiMultiGPUPipeline,
	T5ModelFactory,
	linear_quadratic_schedule,
	)
	from genmo.lib.utils import save_video
	import os

	with open("prompt.txt", "r") as f:
	prompts = [line.rstrip() for line in f]

	pipeline = MochiMultiGPUPipeline(
	text_encoder_factory=T5ModelFactory(),
	world_size=4,
	dit_factory=DitModelFactory(
	model_path=f"weights/dit.safetensors", model_dtype="bf16"
	),
	decoder_factory=DecoderModelFactory(
	model_path=f"weights/decoder.safetensors",
	),
	)
	# read prompt line by line from prompt.txt


	output_dir = "outputs"
	os.makedirs(output_dir, exist_ok=True)
	for i, prompt in enumerate(prompts):
	video = pipeline(
	height=480,
	width=848,
	num_frames=163,
	num_inference_steps=8,
	sigma_schedule=linear_quadratic_schedule(8, 0.1, 6),
	cfg_schedule=[1.5] * 8,
	batch_cfg=False,
	prompt=prompt,
	negative_prompt="",
	seed=12345,
	)[0]
	save_video(video, f"{output_dir}/output_{i}.mp4")
	```

	</details>


	## Training details

	FastMochi is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters:
	- Batch size: 32
	- Resulotion: 480X848
	- Num of frames: 169
	- Train steps: 128
	- GPUs: 16
	- LR: 1e-6
	- Loss: huber

	## Evaluation
	We provide some qualitative comparisons between FastMochi 8 step inference v.s. the original Mochi with 8 step inference:


	\| FastMochi 6 steps \| Mochi 6 steps \|
	\| --- \| --- \|
	\| ![FastMochi 8 step](assets/distilled/1.gif) \| ![Mochi 8 step](assets/undistilled/1.gif) \|
	\| ![FastMochi 8 step](assets/distilled/2.gif) \| ![Mochi 8 step](assets/undistilled/2.gif) \|
	\| ![FastMochi 8 step](assets/distilled/3.gif) \| ![Mochi 8 step](assets/undistilled/3.gif) \|
	\| ![FastMochi 8 step](assets/distilled/4.gif) \| ![Mochi 8 step](assets/undistilled/4.gif) \|