⚠️ License — Non-Commercial Use Only

These are quantized derivative weights of black-forest-labs/FLUX.2-klein-9B (FLUX.2 [klein] 9B), which is licensed under the FLUX Non-Commercial License v2.1 by Black Forest Labs.

This FLUX Model is licensed by Black Forest Labs Inc. under the FLUX Non-Commercial License.

Non-commercial use only. These weights may not be used for any commercial or revenue-generating purpose. Commercial use requires a separate license from Black Forest Labs — see https://bfl.ai/licensing .
Full license: included as LICENSE (FLUX Non-Commercial License v2.1).
Modifications: quantized from FLUX.2 [klein] 9B by the QuantFunc inference engine.
This is not an official Black Forest Labs product and is not endorsed by BFL.

Disclaimer: Derived from FLUX.2 [klein] by Black Forest Labs. This is not an official Black Forest Labs product and is not endorsed by or affiliated with BFL. "FLUX" is a trademark of Black Forest Labs.

QuantFunc

🤗 Hugging Face | 🤖 ModelScope | 💻 GitHub | 💬 WeChat (微信) | 🎮 Discord

⚡ FLUX.2 Klein 9B — the highest-quality Klein tier, pre-quantized. Text-to-image and reference-based editing at 2x–11x with the QuantFunc plugin.

The larger 9B Klein model for maximum fidelity, shipped as distilled (4-step) + base (28-step) transformers across three GPU tiers (50x FP4 · 40x INT4+FP8 · 30x-below INT4+INT8).

Powered by the QuantFunc ComfyUI plugin — the fastest diffusion inference engine:

🚀 2x–11x speedup over standard BF16/FP16 Python pipelines (pre-exported → even faster loading).
⚙️ Native C++/CUDA (libquantfunc.so / quantfunc.dll) with zero Python model dependencies.
🧩 Dual engine (SVDQ offline + Lighting runtime 4-bit), zero-cost LoRA stacking, reference-image editing & inpainting.
🟢 Full GPU coverage — RTX 20/30/40/50 · A100/H100/H200/B100/B200/GB300 · RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native FP4 on Blackwell.

👉 Install the plugin: https://github.com/QuantFunc/ComfyUI-QuantFunc

Klein-9B-Series

Pre-quantized FLUX.2 Klein 9B model series by QuantFunc, Lighting backend. Text-to-image and reference-based image editing.

✨ Both the distilled AND the non-distilled (base) model are supported, and the series ships three GPU tiers so every card gets the best path it can run: 50x (Blackwell, FP4) · 40x (RTX 40 / Ada & Hopper, INT4 + FP8) · 30x-below (RTX 30 and below, INT4 + INT8).

Overview

FLUX.2 Klein is Black Forest Labs' Flux.2 family. The 9B variant (the larger, higher-quality variant, transformer K=4096). QuantFunc ships, pre-quantized:

Distilled transformer — 4-step, fastest few-step generation/editing.
Base / non-distilled transformer — the full 28-step model with classical CFG (--guidance-scale 4.0), highest quality.

…each in 3 hardware tiers (below). Distilled and base share the same base-model — only the transformer file differs.

Hardware tiers (pick by GPU)

FP4 needs Blackwell (SM120); FP8 needs Ada (SM89) or Hopper (SM90) — e.g. RTX 40 / L40 / H100 / H200; INT4/INT8 run everywhere (Ampere/Turing, e.g. RTX 30/20, A100). So:

Tier	GPUs	attention + FFN	modulation/embedders/head	base-model
`50x`	Blackwell (SM120+) — RTX 50 series, B100/B200/GB200, RTX PRO Blackwell	FP4	FP8	`klein-9b-series-50x-above-base-model` (FP4 text encoder)
`40x`	RTX 40 / Ada (SM89) & Hopper (SM90) — RTX 40 series, L40/L40S, H100, H200	INT4	FP8	`klein-9b-series-50x-below-base-model` (INT4 text encoder)
`30x-below`	RTX 30 and below (pre-FP8) — RTX 30/20, A100, A40, T4, down to RTX 2080	INT4	INT8	`klein-9b-series-50x-below-base-model` (INT4 text encoder)

40x and 30x-below share the same INT4 base-model — they differ only in the transformer's 8-bit precision (FP8 vs INT8). 50x uses the FP4 base-model.

Directory Structure

Klein-9B-Series/
├── klein-9b-series-50x-above-base-model/         # FP4 text encoder + VAE(enc+dec) + tokenizer + scheduler   (50x)
├── klein-9b-series-50x-below-base-model/   # INT4 text encoder + VAE(enc+dec) + tokenizer + scheduler  (40x & 30x-below)
├── transformer/
│   ├── config.json
│   ├── klein-9b-50x-lighting.safetensors             # distilled, FP4   (50x)
│   ├── klein-9b-base-50x-lighting.safetensors        # base 28-step, FP4 (50x)
│   ├── klein-9b-40x-lighting.safetensors             # distilled, INT4 + FP8  (40x)
│   ├── klein-9b-base-40x-lighting.safetensors        # base 28-step, INT4 + FP8(40x)
│   ├── klein-9b-30x-below-lighting.safetensors       # distilled, INT4 + INT8 (30x-below)
│   └── klein-9b-base-30x-below-lighting.safetensors  # base 28-step, INT4 + INT8(30x-below)
└── precision-config/
    ├── 50x-fp4-f8-sample.json
    ├── 40x-int4-f8-sample.json
    └── 30x-below-int4-i8-sample.json

Status: ✓ All weights uploaded; the VAE includes both encoder and decoder. Every tier × {distilled, base} is visually validated to generate correctly.

Distilled (4-step) vs Base (28-step)

Transformer	Source	Steps	Guidance	Best for
`klein-9b-<tier>-lighting.safetensors`	Klein distilled	4	none (guidance-distilled)	Fastest
`klein-9b-base-<tier>-lighting.safetensors`	Klein base	28	`--guidance-scale 4.0` (classical CFG)	Highest quality

Inference

# 50x — Blackwell (RTX 50 / B-series). Distilled, 4-step:
quantfunc --model-dir klein-9b-series-50x-above-base-model \
  --transformer transformer/klein-9b-50x-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 4 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

# 40x — RTX 40 / Ada or Hopper (H100/H200). Base 28-step (classical CFG):
quantfunc --model-dir klein-9b-series-50x-below-base-model \
  --transformer transformer/klein-9b-base-40x-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 28 --guidance-scale 4.0 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

# 30x-below — RTX 30 and below. Distilled, 4-step:
quantfunc --model-dir klein-9b-series-50x-below-base-model \
  --transformer transformer/klein-9b-30x-below-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 4 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

--auto-optimize picks the VRAM/attention/compression strategy for your GPU. The ComfyUI Lighting plugin auto-selects the matching tier + precision-config.

Precision Config (precision-config/)

File	Tier / GPU	attention+FFN	islands
`50x-fp4-f8-sample.json`	50x — Blackwell (SM120+)	FP4	FP8
`40x-int4-f8-sample.json`	40x — Ada (SM89) & Hopper (SM90): RTX 40, L40, H100, H200	INT4	FP8
`30x-below-int4-i8-sample.json`	30x-below — RTX 30/20, A100 (pre-FP8)	INT4	INT8

These per-layer configs control the Lighting backend's quantization precision — customize for your own speed/quality trade-off.

Related Repositories

QuantFunc/Klein-4B-Series — FLUX.2 Klein 4B
QuantFunc/Qwen-Image-Series · QuantFunc/Qwen-Image-Edit-Series · QuantFunc/Z-Image-Series

License

The pre-quantized weights are derived from FLUX.2 Klein. Users must comply with the original Black Forest Labs FLUX.2 license. The QuantFunc inference engine and plugins are licensed separately.

Community

Join our community for support, updates, and discussions:

🎮 Discord server
💬 Scan the QR code below to join our WeChat group:

Downloads last month: 166

Model tree for QuantFunc/Klein-9B-Series

Base model

black-forest-labs/FLUX.2-klein-9B

Quantized

(28)

this model