How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("QuantFunc/Klein-4B-Series", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

License & Attribution

These are quantized derivative weights of black-forest-labs/FLUX.2-klein-4B (FLUX.2 [klein] 4B).

  • Modifications: the original weights were quantized (e.g. W4A4 / FP4 / INT4 / FP8) and repackaged for the QuantFunc inference engine โ€” a "modification" under Apache-2.0 ยง4(b).
  • Upstream license: the base model is licensed under the Apache License 2.0, included here as LICENSE-APACHE; the upstream copyright and attribution notices are retained.
  • This derivative: the QuantFunc quantization & packaging are additionally provided under the QuantFunc Model License (see LICENSE).
  • This repository is not affiliated with or endorsed by the upstream model authors.

Disclaimer: Derived from FLUX.2 [klein] by Black Forest Labs. This is not an official Black Forest Labs product and is not endorsed by or affiliated with BFL. "FLUX" is a trademark of Black Forest Labs.

QuantFunc

Logo

๐Ÿค— Hugging Face  |  ๐Ÿค– ModelScope  |  ๐Ÿ’ป GitHub  |  ๐Ÿ’ฌ WeChat (ๅพฎไฟก)  |  ๐ŸŽฎ Discord

โšก FLUX.2 Klein 4B โ€” pre-quantized for speed. The lighter, fast Klein variant for text-to-image and reference-based editing, running 2xโ€“11x faster with the QuantFunc plugin.

This series ships the distilled (4-step) and base (28-step) FLUX.2 Klein 4B transformers across three GPU tiers โ€” 50x FP4 (Blackwell), 40x INT4+FP8 (Ada/Hopper), 30x-below INT4+INT8 (RTX 30 and below) โ€” so every card gets the fastest path it can run.

Powered by the QuantFunc ComfyUI plugin โ€” the fastest diffusion inference engine:

  • ๐Ÿš€ 2xโ€“11x speedup over standard BF16/FP16 Python pipelines (pre-exported โ†’ even faster loading).
  • โš™๏ธ Native C++/CUDA (libquantfunc.so / quantfunc.dll) with zero Python model dependencies.
  • ๐Ÿงฉ Dual engine (SVDQ offline + Lighting runtime 4-bit), zero-cost LoRA stacking, reference-image editing & inpainting.
  • ๐ŸŸข Full GPU coverage โ€” RTX 20/30/40/50 ยท A100/H100/H200/B100/B200/GB300 ยท RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native FP4 on Blackwell.

๐Ÿ‘‰ Install the plugin: https://github.com/QuantFunc/ComfyUI-QuantFunc

Klein-4B-Series

Pre-quantized FLUX.2 Klein 4B model series by QuantFunc, Lighting backend. Text-to-image and reference-based image editing.

โœจ Both the distilled AND the non-distilled (base) model are supported, and the series ships three GPU tiers so every card gets the best path it can run: 50x (Blackwell, FP4) ยท 40x (RTX 40 / Ada & Hopper, INT4 + FP8) ยท 30x-below (RTX 30 and below, INT4 + INT8).

Overview

FLUX.2 Klein is Black Forest Labs' Flux.2 family. The 4B variant (the lighter, fast variant, transformer K=3072). QuantFunc ships, pre-quantized:

  • Distilled transformer โ€” 4-step, fastest few-step generation/editing.
  • Base / non-distilled transformer โ€” the full 28-step model with classical CFG (--guidance-scale 4.0), highest quality.

โ€ฆeach in 3 hardware tiers (below). Distilled and base share the same base-model โ€” only the transformer file differs.

Hardware tiers (pick by GPU)

FP4 needs Blackwell (SM120); FP8 needs Ada (SM89) or Hopper (SM90) โ€” e.g. RTX 40 / L40 / H100 / H200; INT4/INT8 run everywhere (Ampere/Turing, e.g. RTX 30/20, A100). So:

Tier GPUs attention + FFN modulation/embedders/head base-model
50x Blackwell (SM120+) โ€” RTX 50 series, B100/B200/GB200, RTX PRO Blackwell FP4 FP8 klein-4b-series-50x-above-base-model (FP4 text encoder)
40x RTX 40 / Ada (SM89) & Hopper (SM90) โ€” RTX 40 series, L40/L40S, H100, H200 INT4 FP8 klein-4b-series-50x-below-base-model (INT4 text encoder)
30x-below RTX 30 and below (pre-FP8) โ€” RTX 30/20, A100, A40, T4, down to RTX 2080 INT4 INT8 klein-4b-series-50x-below-base-model (INT4 text encoder)

40x and 30x-below share the same INT4 base-model โ€” they differ only in the transformer's 8-bit precision (FP8 vs INT8). 50x uses the FP4 base-model.

Directory Structure

Klein-4B-Series/
โ”œโ”€โ”€ klein-4b-series-50x-above-base-model/         # FP4 text encoder + VAE(enc+dec) + tokenizer + scheduler   (50x)
โ”œโ”€โ”€ klein-4b-series-50x-below-base-model/   # INT4 text encoder + VAE(enc+dec) + tokenizer + scheduler  (40x & 30x-below)
โ”œโ”€โ”€ transformer/
โ”‚   โ”œโ”€โ”€ config.json
โ”‚   โ”œโ”€โ”€ klein-4b-50x-lighting.safetensors             # distilled, FP4   (50x)
โ”‚   โ”œโ”€โ”€ klein-4b-base-50x-lighting.safetensors        # base 28-step, FP4 (50x)
โ”‚   โ”œโ”€โ”€ klein-4b-40x-lighting.safetensors             # distilled, INT4 + FP8  (40x)
โ”‚   โ”œโ”€โ”€ klein-4b-base-40x-lighting.safetensors        # base 28-step, INT4 + FP8(40x)
โ”‚   โ”œโ”€โ”€ klein-4b-30x-below-lighting.safetensors       # distilled, INT4 + INT8 (30x-below)
โ”‚   โ””โ”€โ”€ klein-4b-base-30x-below-lighting.safetensors  # base 28-step, INT4 + INT8(30x-below)
โ””โ”€โ”€ precision-config/
    โ”œโ”€โ”€ 50x-fp4-f8-sample.json
    โ”œโ”€โ”€ 40x-int4-f8-sample.json
    โ””โ”€โ”€ 30x-below-int4-i8-sample.json

Status: โœ“ All weights uploaded; the VAE includes both encoder and decoder. Every tier ร— {distilled, base} is visually validated to generate correctly.

Distilled (4-step) vs Base (28-step)

Transformer Source Steps Guidance Best for
klein-4b-<tier>-lighting.safetensors Klein distilled 4 none (guidance-distilled) Fastest
klein-4b-base-<tier>-lighting.safetensors Klein base 28 --guidance-scale 4.0 (classical CFG) Highest quality

Inference

# 50x โ€” Blackwell (RTX 50 / B-series). Distilled, 4-step:
quantfunc --model-dir klein-4b-series-50x-above-base-model \
  --transformer transformer/klein-4b-50x-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 4 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

# 40x โ€” RTX 40 / Ada or Hopper (H100/H200). Base 28-step (classical CFG):
quantfunc --model-dir klein-4b-series-50x-below-base-model \
  --transformer transformer/klein-4b-base-40x-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 28 --guidance-scale 4.0 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

# 30x-below โ€” RTX 30 and below. Distilled, 4-step:
quantfunc --model-dir klein-4b-series-50x-below-base-model \
  --transformer transformer/klein-4b-30x-below-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 4 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

--auto-optimize picks the VRAM/attention/compression strategy for your GPU. The ComfyUI Lighting plugin auto-selects the matching tier + precision-config.

Precision Config (precision-config/)

File Tier / GPU attention+FFN islands
50x-fp4-f8-sample.json 50x โ€” Blackwell (SM120+) FP4 FP8
40x-int4-f8-sample.json 40x โ€” Ada (SM89) & Hopper (SM90): RTX 40, L40, H100, H200 INT4 FP8
30x-below-int4-i8-sample.json 30x-below โ€” RTX 30/20, A100 (pre-FP8) INT4 INT8

These per-layer configs control the Lighting backend's quantization precision โ€” customize for your own speed/quality trade-off.

Related Repositories

License

The pre-quantized weights are derived from FLUX.2 Klein. Users must comply with the original Black Forest Labs FLUX.2 license. The QuantFunc inference engine and plugins are licensed separately.

Community

Join our community for support, updates, and discussions:

  • ๐ŸŽฎ Discord server
  • ๐Ÿ’ฌ Scan the QR code below to join our WeChat group:
WeChat Group
Downloads last month
141
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for QuantFunc/Klein-4B-Series

Quantized
(21)
this model