Instructions to use QuantFunc/Klein-9B-Series with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use QuantFunc/Klein-9B-Series with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("QuantFunc/Klein-9B-Series", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
โ ๏ธ License โ Non-Commercial Use Only
These are quantized derivative weights of black-forest-labs/FLUX.2-klein-9B (FLUX.2 [klein] 9B), which is
licensed under the FLUX Non-Commercial License v2.1 by Black Forest Labs.
This FLUX Model is licensed by Black Forest Labs Inc. under the FLUX Non-Commercial License.
- Non-commercial use only. These weights may not be used for any commercial or revenue-generating purpose. Commercial use requires a separate license from Black Forest Labs โ see https://bfl.ai/licensing .
- Full license: included as
LICENSE(FLUX Non-Commercial License v2.1). - Modifications: quantized from FLUX.2 [klein] 9B by the QuantFunc inference engine.
- This is not an official Black Forest Labs product and is not endorsed by BFL.
Disclaimer: Derived from FLUX.2 [klein] by Black Forest Labs. This is not an official Black Forest Labs product and is not endorsed by or affiliated with BFL. "FLUX" is a trademark of Black Forest Labs.
QuantFunc
๐ค Hugging Face | ๐ค ModelScope | ๐ป GitHub | ๐ฌ WeChat (ๅพฎไฟก) | ๐ฎ Discord
โก FLUX.2 Klein 9B โ the highest-quality Klein tier, pre-quantized. Text-to-image and reference-based editing at 2xโ11x with the QuantFunc plugin.
The larger 9B Klein model for maximum fidelity, shipped as distilled (4-step) + base (28-step) transformers across three GPU tiers (50x FP4 ยท 40x INT4+FP8 ยท 30x-below INT4+INT8).
Powered by the QuantFunc ComfyUI plugin โ the fastest diffusion inference engine:
- ๐ 2xโ11x speedup over standard BF16/FP16 Python pipelines (pre-exported โ even faster loading).
- โ๏ธ Native C++/CUDA (
libquantfunc.so/quantfunc.dll) with zero Python model dependencies. - ๐งฉ Dual engine (SVDQ offline + Lighting runtime 4-bit), zero-cost LoRA stacking, reference-image editing & inpainting.
- ๐ข Full GPU coverage โ RTX 20/30/40/50 ยท A100/H100/H200/B100/B200/GB300 ยท RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native FP4 on Blackwell.
๐ Install the plugin: https://github.com/QuantFunc/ComfyUI-QuantFunc
Klein-9B-Series
Pre-quantized FLUX.2 Klein 9B model series by QuantFunc, Lighting backend. Text-to-image and reference-based image editing.
โจ Both the distilled AND the non-distilled (base) model are supported, and the series ships three GPU tiers so every card gets the best path it can run:
50x(Blackwell, FP4) ยท40x(RTX 40 / Ada & Hopper, INT4 + FP8) ยท30x-below(RTX 30 and below, INT4 + INT8).
Overview
FLUX.2 Klein is Black Forest Labs' Flux.2 family. The 9B variant (the larger, higher-quality variant, transformer K=4096). QuantFunc ships, pre-quantized:
- Distilled transformer โ 4-step, fastest few-step generation/editing.
- Base / non-distilled transformer โ the full 28-step model with classical CFG (
--guidance-scale 4.0), highest quality.
โฆeach in 3 hardware tiers (below). Distilled and base share the same base-model โ only the transformer file differs.
Hardware tiers (pick by GPU)
FP4 needs Blackwell (SM120); FP8 needs Ada (SM89) or Hopper (SM90) โ e.g. RTX 40 / L40 / H100 / H200; INT4/INT8 run everywhere (Ampere/Turing, e.g. RTX 30/20, A100). So:
| Tier | GPUs | attention + FFN | modulation/embedders/head | base-model |
|---|---|---|---|---|
50x |
Blackwell (SM120+) โ RTX 50 series, B100/B200/GB200, RTX PRO Blackwell | FP4 | FP8 | klein-9b-series-50x-above-base-model (FP4 text encoder) |
40x |
RTX 40 / Ada (SM89) & Hopper (SM90) โ RTX 40 series, L40/L40S, H100, H200 | INT4 | FP8 | klein-9b-series-50x-below-base-model (INT4 text encoder) |
30x-below |
RTX 30 and below (pre-FP8) โ RTX 30/20, A100, A40, T4, down to RTX 2080 | INT4 | INT8 | klein-9b-series-50x-below-base-model (INT4 text encoder) |
40xand30x-belowshare the same INT4 base-model โ they differ only in the transformer's 8-bit precision (FP8 vs INT8).50xuses the FP4 base-model.
Directory Structure
Klein-9B-Series/
โโโ klein-9b-series-50x-above-base-model/ # FP4 text encoder + VAE(enc+dec) + tokenizer + scheduler (50x)
โโโ klein-9b-series-50x-below-base-model/ # INT4 text encoder + VAE(enc+dec) + tokenizer + scheduler (40x & 30x-below)
โโโ transformer/
โ โโโ config.json
โ โโโ klein-9b-50x-lighting.safetensors # distilled, FP4 (50x)
โ โโโ klein-9b-base-50x-lighting.safetensors # base 28-step, FP4 (50x)
โ โโโ klein-9b-40x-lighting.safetensors # distilled, INT4 + FP8 (40x)
โ โโโ klein-9b-base-40x-lighting.safetensors # base 28-step, INT4 + FP8(40x)
โ โโโ klein-9b-30x-below-lighting.safetensors # distilled, INT4 + INT8 (30x-below)
โ โโโ klein-9b-base-30x-below-lighting.safetensors # base 28-step, INT4 + INT8(30x-below)
โโโ precision-config/
โโโ 50x-fp4-f8-sample.json
โโโ 40x-int4-f8-sample.json
โโโ 30x-below-int4-i8-sample.json
Status: โ All weights uploaded; the VAE includes both encoder and decoder. Every tier ร {distilled, base} is visually validated to generate correctly.
Distilled (4-step) vs Base (28-step)
| Transformer | Source | Steps | Guidance | Best for |
|---|---|---|---|---|
klein-9b-<tier>-lighting.safetensors |
Klein distilled | 4 | none (guidance-distilled) | Fastest |
klein-9b-base-<tier>-lighting.safetensors |
Klein base | 28 | --guidance-scale 4.0 (classical CFG) |
Highest quality |
Inference
# 50x โ Blackwell (RTX 50 / B-series). Distilled, 4-step:
quantfunc --model-dir klein-9b-series-50x-above-base-model \
--transformer transformer/klein-9b-50x-lighting.safetensors \
--model-backend lighting --auto-optimize --steps 4 \
--prompt "a cute cat on a windowsill, watercolor style" --output out.png
# 40x โ RTX 40 / Ada or Hopper (H100/H200). Base 28-step (classical CFG):
quantfunc --model-dir klein-9b-series-50x-below-base-model \
--transformer transformer/klein-9b-base-40x-lighting.safetensors \
--model-backend lighting --auto-optimize --steps 28 --guidance-scale 4.0 \
--prompt "a cute cat on a windowsill, watercolor style" --output out.png
# 30x-below โ RTX 30 and below. Distilled, 4-step:
quantfunc --model-dir klein-9b-series-50x-below-base-model \
--transformer transformer/klein-9b-30x-below-lighting.safetensors \
--model-backend lighting --auto-optimize --steps 4 \
--prompt "a cute cat on a windowsill, watercolor style" --output out.png
--auto-optimize picks the VRAM/attention/compression strategy for your GPU. The ComfyUI Lighting plugin auto-selects the matching tier + precision-config.
Precision Config (precision-config/)
| File | Tier / GPU | attention+FFN | islands |
|---|---|---|---|
50x-fp4-f8-sample.json |
50x โ Blackwell (SM120+) | FP4 | FP8 |
40x-int4-f8-sample.json |
40x โ Ada (SM89) & Hopper (SM90): RTX 40, L40, H100, H200 | INT4 | FP8 |
30x-below-int4-i8-sample.json |
30x-below โ RTX 30/20, A100 (pre-FP8) | INT4 | INT8 |
These per-layer configs control the Lighting backend's quantization precision โ customize for your own speed/quality trade-off.
Related Repositories
- QuantFunc/Klein-4B-Series โ FLUX.2 Klein 4B
- QuantFunc/Qwen-Image-Series ยท QuantFunc/Qwen-Image-Edit-Series ยท QuantFunc/Z-Image-Series
License
The pre-quantized weights are derived from FLUX.2 Klein. Users must comply with the original Black Forest Labs FLUX.2 license. The QuantFunc inference engine and plugins are licensed separately.
Community
Join our community for support, updates, and discussions:
- ๐ฎ Discord server
- ๐ฌ Scan the QR code below to join our WeChat group:
- Downloads last month
- 166
Model tree for QuantFunc/Klein-9B-Series
Base model
black-forest-labs/FLUX.2-klein-9B