โš ๏ธ License โ€” Non-Commercial Use Only

These are quantized derivative weights of black-forest-labs/FLUX.2-klein-9B (FLUX.2 [klein] 9B), which is licensed under the FLUX Non-Commercial License v2.1 by Black Forest Labs.

This FLUX Model is licensed by Black Forest Labs Inc. under the FLUX Non-Commercial License.

  • Non-commercial use only. These weights may not be used for any commercial or revenue-generating purpose. Commercial use requires a separate license from Black Forest Labs โ€” see https://bfl.ai/licensing .
  • Full license: included as LICENSE (FLUX Non-Commercial License v2.1).
  • Modifications: quantized from FLUX.2 [klein] 9B by the QuantFunc inference engine.
  • This is not an official Black Forest Labs product and is not endorsed by BFL.

Disclaimer: Derived from FLUX.2 [klein] by Black Forest Labs. This is not an official Black Forest Labs product and is not endorsed by or affiliated with BFL. "FLUX" is a trademark of Black Forest Labs.

QuantFunc

Logo

๐Ÿค— Hugging Face  |  ๐Ÿค– ModelScope  |  ๐Ÿ’ป GitHub  |  ๐Ÿ’ฌ WeChat (ๅพฎไฟก)  |  ๐ŸŽฎ Discord

โšก FLUX.2 Klein 9B โ€” the highest-quality Klein tier, pre-quantized. Text-to-image and reference-based editing at 2xโ€“11x with the QuantFunc plugin.

The larger 9B Klein model for maximum fidelity, shipped as distilled (4-step) + base (28-step) transformers across three GPU tiers (50x FP4 ยท 40x INT4+FP8 ยท 30x-below INT4+INT8).

Powered by the QuantFunc ComfyUI plugin โ€” the fastest diffusion inference engine:

  • ๐Ÿš€ 2xโ€“11x speedup over standard BF16/FP16 Python pipelines (pre-exported โ†’ even faster loading).
  • โš™๏ธ Native C++/CUDA (libquantfunc.so / quantfunc.dll) with zero Python model dependencies.
  • ๐Ÿงฉ Dual engine (SVDQ offline + Lighting runtime 4-bit), zero-cost LoRA stacking, reference-image editing & inpainting.
  • ๐ŸŸข Full GPU coverage โ€” RTX 20/30/40/50 ยท A100/H100/H200/B100/B200/GB300 ยท RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native FP4 on Blackwell.

๐Ÿ‘‰ Install the plugin: https://github.com/QuantFunc/ComfyUI-QuantFunc

Klein-9B-Series

Pre-quantized FLUX.2 Klein 9B model series by QuantFunc, Lighting backend. Text-to-image and reference-based image editing.

โœจ Both the distilled AND the non-distilled (base) model are supported, and the series ships three GPU tiers so every card gets the best path it can run: 50x (Blackwell, FP4) ยท 40x (RTX 40 / Ada & Hopper, INT4 + FP8) ยท 30x-below (RTX 30 and below, INT4 + INT8).

Overview

FLUX.2 Klein is Black Forest Labs' Flux.2 family. The 9B variant (the larger, higher-quality variant, transformer K=4096). QuantFunc ships, pre-quantized:

  • Distilled transformer โ€” 4-step, fastest few-step generation/editing.
  • Base / non-distilled transformer โ€” the full 28-step model with classical CFG (--guidance-scale 4.0), highest quality.

โ€ฆeach in 3 hardware tiers (below). Distilled and base share the same base-model โ€” only the transformer file differs.

Hardware tiers (pick by GPU)

FP4 needs Blackwell (SM120); FP8 needs Ada (SM89) or Hopper (SM90) โ€” e.g. RTX 40 / L40 / H100 / H200; INT4/INT8 run everywhere (Ampere/Turing, e.g. RTX 30/20, A100). So:

Tier GPUs attention + FFN modulation/embedders/head base-model
50x Blackwell (SM120+) โ€” RTX 50 series, B100/B200/GB200, RTX PRO Blackwell FP4 FP8 klein-9b-series-50x-above-base-model (FP4 text encoder)
40x RTX 40 / Ada (SM89) & Hopper (SM90) โ€” RTX 40 series, L40/L40S, H100, H200 INT4 FP8 klein-9b-series-50x-below-base-model (INT4 text encoder)
30x-below RTX 30 and below (pre-FP8) โ€” RTX 30/20, A100, A40, T4, down to RTX 2080 INT4 INT8 klein-9b-series-50x-below-base-model (INT4 text encoder)

40x and 30x-below share the same INT4 base-model โ€” they differ only in the transformer's 8-bit precision (FP8 vs INT8). 50x uses the FP4 base-model.

Directory Structure

Klein-9B-Series/
โ”œโ”€โ”€ klein-9b-series-50x-above-base-model/         # FP4 text encoder + VAE(enc+dec) + tokenizer + scheduler   (50x)
โ”œโ”€โ”€ klein-9b-series-50x-below-base-model/   # INT4 text encoder + VAE(enc+dec) + tokenizer + scheduler  (40x & 30x-below)
โ”œโ”€โ”€ transformer/
โ”‚   โ”œโ”€โ”€ config.json
โ”‚   โ”œโ”€โ”€ klein-9b-50x-lighting.safetensors             # distilled, FP4   (50x)
โ”‚   โ”œโ”€โ”€ klein-9b-base-50x-lighting.safetensors        # base 28-step, FP4 (50x)
โ”‚   โ”œโ”€โ”€ klein-9b-40x-lighting.safetensors             # distilled, INT4 + FP8  (40x)
โ”‚   โ”œโ”€โ”€ klein-9b-base-40x-lighting.safetensors        # base 28-step, INT4 + FP8(40x)
โ”‚   โ”œโ”€โ”€ klein-9b-30x-below-lighting.safetensors       # distilled, INT4 + INT8 (30x-below)
โ”‚   โ””โ”€โ”€ klein-9b-base-30x-below-lighting.safetensors  # base 28-step, INT4 + INT8(30x-below)
โ””โ”€โ”€ precision-config/
    โ”œโ”€โ”€ 50x-fp4-f8-sample.json
    โ”œโ”€โ”€ 40x-int4-f8-sample.json
    โ””โ”€โ”€ 30x-below-int4-i8-sample.json

Status: โœ“ All weights uploaded; the VAE includes both encoder and decoder. Every tier ร— {distilled, base} is visually validated to generate correctly.

Distilled (4-step) vs Base (28-step)

Transformer Source Steps Guidance Best for
klein-9b-<tier>-lighting.safetensors Klein distilled 4 none (guidance-distilled) Fastest
klein-9b-base-<tier>-lighting.safetensors Klein base 28 --guidance-scale 4.0 (classical CFG) Highest quality

Inference

# 50x โ€” Blackwell (RTX 50 / B-series). Distilled, 4-step:
quantfunc --model-dir klein-9b-series-50x-above-base-model \
  --transformer transformer/klein-9b-50x-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 4 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

# 40x โ€” RTX 40 / Ada or Hopper (H100/H200). Base 28-step (classical CFG):
quantfunc --model-dir klein-9b-series-50x-below-base-model \
  --transformer transformer/klein-9b-base-40x-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 28 --guidance-scale 4.0 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

# 30x-below โ€” RTX 30 and below. Distilled, 4-step:
quantfunc --model-dir klein-9b-series-50x-below-base-model \
  --transformer transformer/klein-9b-30x-below-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 4 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

--auto-optimize picks the VRAM/attention/compression strategy for your GPU. The ComfyUI Lighting plugin auto-selects the matching tier + precision-config.

Precision Config (precision-config/)

File Tier / GPU attention+FFN islands
50x-fp4-f8-sample.json 50x โ€” Blackwell (SM120+) FP4 FP8
40x-int4-f8-sample.json 40x โ€” Ada (SM89) & Hopper (SM90): RTX 40, L40, H100, H200 INT4 FP8
30x-below-int4-i8-sample.json 30x-below โ€” RTX 30/20, A100 (pre-FP8) INT4 INT8

These per-layer configs control the Lighting backend's quantization precision โ€” customize for your own speed/quality trade-off.

Related Repositories

License

The pre-quantized weights are derived from FLUX.2 Klein. Users must comply with the original Black Forest Labs FLUX.2 license. The QuantFunc inference engine and plugins are licensed separately.

Community

Join our community for support, updates, and discussions:

  • ๐ŸŽฎ Discord server
  • ๐Ÿ’ฌ Scan the QR code below to join our WeChat group:
WeChat Group
Downloads last month
166
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for QuantFunc/Klein-9B-Series

Quantized
(28)
this model