Instructions to use QuantFunc/Z-Image-Series with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use QuantFunc/Z-Image-Series with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("QuantFunc/Z-Image-Series", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
License & Attribution
These are quantized derivative weights of Tongyi-MAI/Z-Image-Turbo (Z-Image-Turbo).
- Modifications: the original weights were quantized (e.g. W4A4 / FP4 / INT4 / FP8) and repackaged for the QuantFunc inference engine โ a "modification" under Apache-2.0 ยง4(b).
- Upstream license: the base model is licensed under the Apache License 2.0, included here as
LICENSE-APACHE; the upstream copyright and attribution notices are retained. - This derivative: the QuantFunc quantization & packaging are additionally provided under the QuantFunc Model License (see
LICENSE). - This repository is not affiliated with or endorsed by the upstream model authors.
QuantFunc
๐ค Hugging Face | ๐ค ModelScope | ๐ป GitHub | ๐ฌ WeChat (ๅพฎไฟก) | ๐ฎ Discord
โก Z-Image-Turbo โ ultra-fast text-to-image. Pre-quantized for the QuantFunc plugin: 2xโ11x speedup, running from RTX 20-series up.
Pre-quantized Z-Image-Turbo (Alibaba Tongyi) for the Lighting engine โ a high-speed distilled text-to-image model (~1.2s per image on RTX 4090, ~4.2x).
Powered by the QuantFunc ComfyUI plugin โ the fastest diffusion inference engine:
- ๐ 2xโ11x speedup over standard BF16/FP16 Python pipelines (pre-exported โ even faster loading).
- โ๏ธ Native C++/CUDA (
libquantfunc.so/quantfunc.dll) with zero Python model dependencies. - ๐งฉ Dual engine (SVDQ offline + Lighting runtime 4-bit), zero-cost LoRA stacking, reference-image editing & inpainting.
- ๐ข Full GPU coverage โ RTX 20/30/40/50 ยท A100/H100/H200/B100/B200/GB300 ยท RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native FP4 on Blackwell.
๐ Install the plugin: https://github.com/QuantFunc/ComfyUI-QuantFunc
Z-Image-Series
Pre-quantized Z-Image-Turbo text-to-image model series by QuantFunc, with Lighting backend inference support.
Overview
Z-Image-Turbo is a high-speed text-to-image diffusion model distilled from Alibaba Tongyi team's image generation model. This repository provides the complete inference model pre-quantized and exported via QuantFunc.
With the latest QuantFunc ComfyUI plugin, inference achieves 2xโ11x speedup over mainstream frameworks.
Hardware Requirements
- Supports NVIDIA RTX 20 series and above
- RTX 20 series does not support BF16, which causes significant precision loss in Qwen series model quantization scenarios. Therefore, the 20 series currently only supports Z-Image models.
Directory Structure
Z-Image-Series/
โโโ z-image-series-50x-above-base-model/ # Base model, optimized for RTX 50 series and above
โ โโโ text_encoder/ # Qwen3 text encoder (pre-quantized)
โ โโโ vae/ # VAE decoder (~160MB)
โ โโโ tokenizer/ # Tokenizer
โ โโโ scheduler/ # Scheduler config
โ โโโ model_index.json # Model index
โ โโโ quantfunc_config.json # QuantFunc quantization config
โโโ z-image-series-50x-below-base-model/ # Base model, optimized for RTX 50 series and below
โ โโโ (same structure as above)
โโโ transformer/
โโโ config.json # Transformer architecture config
โโโ z-image-turbo-50x-above-lighting.safetensors # RTX 50+ Lighting (~3.5GB)
โโโ z-image-turbo-50x-below-lighting.safetensors # RTX 20/30/40 Lighting (~3.3GB)
Model Variants
| Variant | base-model | transformer | Total Size | Target GPU |
|---|---|---|---|---|
| 50x-above | z-image-series-50x-above-base-model |
z-image-turbo-50x-above-lighting.safetensors |
~6.5GB | RTX 50 series and above |
| 50x-below | z-image-series-50x-below-base-model |
z-image-turbo-50x-below-lighting.safetensors |
~6.2GB | RTX 20/30/40 + datacenter (T4/A100/H100/H200) |
- 50x-above: Optimized for RTX 50 series (Blackwell) and above
- 50x-below: Optimized for RTX 20/30/40 series and all pre-Blackwell datacenter GPUs (T4, A100, H100, H200)
The base-model and transformer must use the same variant (both above or both below).
Quick Start
Download
pip install huggingface_hub
from huggingface_hub import snapshot_download
model_dir = snapshot_download('QuantFunc/Z-Image-Series')
Inference
# RTX 50 series
quantfunc \
--model-dir Z-Image-Series/z-image-series-50x-above-base-model \
--transformer Z-Image-Series/transformer/z-image-turbo-50x-above-lighting.safetensors \
--auto-optimize --model-backend lighting \
--prompt "a cute cat sitting on a windowsill watching rain" \
--output output.png --steps 4
# RTX 20/30/40 series
quantfunc \
--model-dir Z-Image-Series/z-image-series-50x-below-base-model \
--transformer Z-Image-Series/transformer/z-image-turbo-50x-below-lighting.safetensors \
--auto-optimize --model-backend lighting \
--prompt "a cute cat sitting on a windowsill watching rain" \
--output output.png --steps 4
--auto-optimize automatically selects the optimal VRAM management, attention backend, and quantization compression strategy based on your GPU.
SVDQ && Lighting Backend
This repository provides Lighting backend models. Differences between the two backends:
| Feature | Lighting | SVDQ |
|---|---|---|
| Quantization | Per-layer mixed precision (FP4/INT4/FP8/INT8) | Nunchaku-based holistic pre-quantization |
| LoRA Integration | Real-time quantization โ build a custom model in 5 minutes with zero speed loss, integrating any number of LoRAs | Runtime low-rank pathway |
| Ecosystem | QuantFunc native | Compatible with the widely-adopted Nunchaku ecosystem, enhanced with Rotation quantization and Auto Rank dynamic rank optimization |
| Flexibility | Per-layer precision control | Precision fixed at export time |
| Use Cases | Rapid personal model customization, batch LoRA integration | Leverage Nunchaku ecosystem, runtime dynamic LoRA |
Precision Config (precision-config/)
Sample per-layer precision configurations for the Lighting backend:
| File | Target GPU | Precision |
|---|---|---|
50x-above-fp4-sample.json |
RTX 50+ | FP4 all layers |
50x-below-int4-sample.json |
RTX 30/40 | INT4 all layers |
Related Repositories
- QuantFunc/Qwen-Image-Series โ Qwen-Image text-to-image (60 layers)
- QuantFunc/Qwen-Image-Edit-Series โ Qwen-Image-Edit image editing
License
The pre-quantized model weights in this repository are derived from the original models. Users must comply with the original model's license agreement. The QuantFunc inference engine and its plugins (including the ComfyUI plugin) are licensed separately โ see official QuantFunc channels for details.
For models quantized from commercially licensed models, users are responsible for obtaining the necessary commercial licenses from the original model providers.
Community
Join our community for support, updates, and discussions:
- ๐ฎ Discord server
- ๐ฌ Scan the QR code below to join our WeChat group:
- Downloads last month
- 57
Model tree for QuantFunc/Z-Image-Series
Base model
Tongyi-MAI/Z-Image-Turbo