How to use from the
Use from the
MLX library
# Download the model from the Hub
pip install huggingface_hub[hf_xet]

huggingface-cli download --local-dir wan2.2-t2v-a14b-diffusers-8bit AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit

wan2.2-t2v-a14b-diffusers-8bit

This repository contains mixed q8/BF16 MLX-Gen saved weights for Wan-AI/Wan2.2-T2V-A14B-Diffusers. It is designed for local Apple Silicon inference with mlx-gen.

It uses the mflux/MLX saved-weight layout with MLX quantization tensors. It is not a Diffusers or Transformers from_pretrained() checkpoint.

Source Model

Original model: Wan-AI/Wan2.2-T2V-A14B-Diffusers.

This quantized derivative follows the Apache 2.0 license of the source model.

Quantization

This is a mixed q8/BF16 checkpoint:

  • q8 for quantizable Wan transformer block attention and feed-forward linears.
  • BF16 for the Wan VAE.
  • BF16 for Wan transformer conditioning/output projection linears, the UMT5 text encoder, scheduler metadata, tokenizer files, norms, convolutions, and other non-quantizable parameters.

This mixed policy is used because fully quantizing sensitive Wan A14B paths produced invalid or low-quality video in local validation.

Validation

Measured on 2026-06-04 with mlx-gen 0.18.9 on Apple Silicon. The upstream Diffusers source snapshot measured about 118 GiB in the local Hugging Face cache before preparing these packages. The table below reports prepared-package generation from model init through MP4 save and post-save video-health validation.

Validation profile: 384x224, 33 frames, 12 denoising steps, guidance 4, guidance-2 3, 8 fps, seed 4242, --low-ram.

Package Disk Full-Process Physical Peak Max RSS MLX Peak Total Time Video Health
BF16 package 64.3 GiB 33.0 GiB 31.8 GiB 27.7 GiB 152.7 s 33/33 frames, 384x224, 8 fps, temporal delta 1.3
This mixed q8/BF16 package 39.7 GiB 20.7 GiB 19.5 GiB 15.5 GiB 154.8 s 33/33 frames, 384x224, 8 fps, temporal delta 1.4

Compared with the BF16 prepared package at the same validation profile, this mixed q8/BF16 package reduces disk usage by about 38% and full-process physical peak memory by about 37%. Total time was about 1% slower in this run.

Physical peak is Darwin ri_phys_footprint sampled for the full process. The validation is intentionally small and repeatable; it is not a claim that every full-size 1280x720, 81-frame, 40-step job has the same memory or timing profile.

Usage

python -m pip install -U mlx-gen

mlxgen download --model AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit

mlxgen generate \
  --model AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit \
  --task text-to-video \
  --prompt "A cinematic scene of a scientist working on agentic AI through the night, monitors glowing, papers shifting in a slow dolly shot." \
  --width 384 \
  --height 224 \
  --frames 33 \
  --steps 12 \
  --guidance 4 \
  --guidance-2 3 \
  --fps 8 \
  --seed 4242 \
  --low-ram \
  --metadata \
  --output video.mp4

Compatibility

Requires mlx-gen >= 0.18.9.

Generated with mlx-gen 0.18.9.

Use the mlxgen command and Python import path for new MLX-Gen projects.

Attribution

MLX-Gen is based on mflux by Filip Strand and the original mflux contributors.

Quantized and contributed by @lpalbou.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit

Finetuned
(14)
this model

Collection including AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit