License & Attribution

These are quantized derivative weights of black-forest-labs/FLUX.2-klein-4B (FLUX.2 [klein] 4B).

Modifications: the original weights were quantized (e.g. W4A4 / FP4 / INT4 / FP8) and repackaged for the QuantFunc inference engine — a "modification" under Apache-2.0 §4(b).
Upstream license: the base model is licensed under the Apache License 2.0, included here as LICENSE-APACHE; the upstream copyright and attribution notices are retained.
This derivative: the QuantFunc quantization & packaging are additionally provided under the QuantFunc Model License (see LICENSE).
This repository is not affiliated with or endorsed by the upstream model authors.

Disclaimer: Derived from FLUX.2 [klein] by Black Forest Labs. This is not an official Black Forest Labs product and is not endorsed by or affiliated with BFL. "FLUX" is a trademark of Black Forest Labs.

QuantFunc

🤗 Hugging Face | 🤖 ModelScope | 💻 GitHub | 💬 WeChat (微信) | 🎮 Discord

⚡ FLUX.2 Klein 4B — pre-quantized for speed. The lighter, fast Klein variant for text-to-image and reference-based editing, running 2x–11x faster with the QuantFunc plugin.

This series ships the distilled (4-step) and base (28-step) FLUX.2 Klein 4B transformers across three GPU tiers — 50x FP4 (Blackwell), 40x INT4+FP8 (Ada/Hopper), 30x-below INT4+INT8 (RTX 30 and below) — so every card gets the fastest path it can run.

Powered by the QuantFunc ComfyUI plugin — the fastest diffusion inference engine:

🚀 2x–11x speedup over standard BF16/FP16 Python pipelines (pre-exported → even faster loading).
⚙️ Native C++/CUDA (libquantfunc.so / quantfunc.dll) with zero Python model dependencies.
🧩 Dual engine (SVDQ offline + Lighting runtime 4-bit), zero-cost LoRA stacking, reference-image editing & inpainting.
🟢 Full GPU coverage — RTX 20/30/40/50 · A100/H100/H200/B100/B200/GB300 · RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native FP4 on Blackwell.

👉 Install the plugin: https://github.com/QuantFunc/ComfyUI-QuantFunc

Klein-4B-Series

Pre-quantized FLUX.2 Klein 4B model series by QuantFunc, Lighting backend. Text-to-image and reference-based image editing.

✨ Both the distilled AND the non-distilled (base) model are supported, and the series ships three GPU tiers so every card gets the best path it can run: 50x (Blackwell, FP4) · 40x (RTX 40 / Ada & Hopper, INT4 + FP8) · 30x-below (RTX 30 and below, INT4 + INT8).

Overview

FLUX.2 Klein is Black Forest Labs' Flux.2 family. The 4B variant (the lighter, fast variant, transformer K=3072). QuantFunc ships, pre-quantized:

Distilled transformer — 4-step, fastest few-step generation/editing.
Base / non-distilled transformer — the full 28-step model with classical CFG (--guidance-scale 4.0), highest quality.

…each in 3 hardware tiers (below). Distilled and base share the same base-model — only the transformer file differs.

Hardware tiers (pick by GPU)

FP4 needs Blackwell (SM120); FP8 needs Ada (SM89) or Hopper (SM90) — e.g. RTX 40 / L40 / H100 / H200; INT4/INT8 run everywhere (Ampere/Turing, e.g. RTX 30/20, A100). So:

Tier	GPUs	attention + FFN	modulation/embedders/head	base-model
`50x`	Blackwell (SM120+) — RTX 50 series, B100/B200/GB200, RTX PRO Blackwell	FP4	FP8	`klein-4b-series-50x-above-base-model` (FP4 text encoder)
`40x`	RTX 40 / Ada (SM89) & Hopper (SM90) — RTX 40 series, L40/L40S, H100, H200	INT4	FP8	`klein-4b-series-50x-below-base-model` (INT4 text encoder)
`30x-below`	RTX 30 and below (pre-FP8) — RTX 30/20, A100, A40, T4, down to RTX 2080	INT4	INT8	`klein-4b-series-50x-below-base-model` (INT4 text encoder)

40x and 30x-below share the same INT4 base-model — they differ only in the transformer's 8-bit precision (FP8 vs INT8). 50x uses the FP4 base-model.

Directory Structure

Klein-4B-Series/
├── klein-4b-series-50x-above-base-model/         # FP4 text encoder + VAE(enc+dec) + tokenizer + scheduler   (50x)
├── klein-4b-series-50x-below-base-model/   # INT4 text encoder + VAE(enc+dec) + tokenizer + scheduler  (40x & 30x-below)
├── transformer/
│   ├── config.json
│   ├── klein-4b-50x-lighting.safetensors             # distilled, FP4   (50x)
│   ├── klein-4b-base-50x-lighting.safetensors        # base 28-step, FP4 (50x)
│   ├── klein-4b-40x-lighting.safetensors             # distilled, INT4 + FP8  (40x)
│   ├── klein-4b-base-40x-lighting.safetensors        # base 28-step, INT4 + FP8(40x)
│   ├── klein-4b-30x-below-lighting.safetensors       # distilled, INT4 + INT8 (30x-below)
│   └── klein-4b-base-30x-below-lighting.safetensors  # base 28-step, INT4 + INT8(30x-below)
└── precision-config/
    ├── 50x-fp4-f8-sample.json
    ├── 40x-int4-f8-sample.json
    └── 30x-below-int4-i8-sample.json

Status: ✓ All weights uploaded; the VAE includes both encoder and decoder. Every tier × {distilled, base} is visually validated to generate correctly.

Distilled (4-step) vs Base (28-step)

Transformer	Source	Steps	Guidance	Best for
`klein-4b-<tier>-lighting.safetensors`	Klein distilled	4	none (guidance-distilled)	Fastest
`klein-4b-base-<tier>-lighting.safetensors`	Klein base	28	`--guidance-scale 4.0` (classical CFG)	Highest quality

Inference

# 50x — Blackwell (RTX 50 / B-series). Distilled, 4-step:
quantfunc --model-dir klein-4b-series-50x-above-base-model \
  --transformer transformer/klein-4b-50x-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 4 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

# 40x — RTX 40 / Ada or Hopper (H100/H200). Base 28-step (classical CFG):
quantfunc --model-dir klein-4b-series-50x-below-base-model \
  --transformer transformer/klein-4b-base-40x-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 28 --guidance-scale 4.0 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

# 30x-below — RTX 30 and below. Distilled, 4-step:
quantfunc --model-dir klein-4b-series-50x-below-base-model \
  --transformer transformer/klein-4b-30x-below-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 4 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

--auto-optimize picks the VRAM/attention/compression strategy for your GPU. The ComfyUI Lighting plugin auto-selects the matching tier + precision-config.

Precision Config (precision-config/)

File	Tier / GPU	attention+FFN	islands
`50x-fp4-f8-sample.json`	50x — Blackwell (SM120+)	FP4	FP8
`40x-int4-f8-sample.json`	40x — Ada (SM89) & Hopper (SM90): RTX 40, L40, H100, H200	INT4	FP8
`30x-below-int4-i8-sample.json`	30x-below — RTX 30/20, A100 (pre-FP8)	INT4	INT8

These per-layer configs control the Lighting backend's quantization precision — customize for your own speed/quality trade-off.

Related Repositories

QuantFunc/Klein-9B-Series — FLUX.2 Klein 9B
QuantFunc/Qwen-Image-Series · QuantFunc/Qwen-Image-Edit-Series · QuantFunc/Z-Image-Series

License

The pre-quantized weights are derived from FLUX.2 Klein. Users must comply with the original Black Forest Labs FLUX.2 license. The QuantFunc inference engine and plugins are licensed separately.

Community

Join our community for support, updates, and discussions:

🎮 Discord server
💬 Scan the QR code below to join our WeChat group:

Downloads last month: 141

Model tree for QuantFunc/Klein-4B-Series

Base model

black-forest-labs/FLUX.2-klein-4B

Quantized

(21)

this model