Instructions to use black-forest-labs/FLUX.2-dev with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use black-forest-labs/FLUX.2-dev with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.2-dev", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Diffusion Single File
How to use black-forest-labs/FLUX.2-dev with Diffusion Single File:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Inference
- Notebooks
- Google Colab
- Kaggle
Finetune loss never decrease
I tried to finetune this model using diffuser repo. Basically I'm following this blog's method: Diffusers welcomes FLUX-2. But, the loss result never decreased even after 5 epoch. Is there any missing step that I should do?
Here is the loss graph:
Here is the parameters:
#! /bin/bash
MLFLOW_TRACKING_URI=file:///nvme/fahadh/mlruns MLFLOW_EXPERIMENT_NAME=flux2-train accelerate launch \
--config_file /nvme/fahadh/train-flux/config.yaml \
/nvme/fahadh/train-flux/diffusers/examples/dreambooth/train_dreambooth_lora_flux2.py \
--pretrained_model_name_or_path="/nvme/fahadh/models/FLUX.2-dev" \
--mixed_precision="bf16" \
--gradient_checkpointing \
--cache_latents \
--offload \
--remote_text_encoder \
--caption_column="caption"\
--dataset_name="/nvme/fahadh/datasets/image-dataset" \
--output_dir="/nvme/fahadh/train-flux/flux2_LoRA-final" \
--instance_prompt="" \
--train_batch_size=2 \
--guidance_scale=1 \
--gradient_accumulation_steps=1 \
--optimizer="prodigy" \
--learning_rate=1.0 \
--report_to="mlflow" \
--lr_scheduler="cosine" \
--lr_warmup_steps=0 \
--checkpointing_steps=250 \
--checkpoints_total_limit=2 \
--num_train_epochs=5 \
--rank=32 \
--lora_alpha=32 \
--lora_layers="attn.to_k,attn.to_q,attn.to_v,attn.to_out.0,attn.add_k_proj,attn.add_q_proj,attn.add_v_proj,attn.to_add_out,ff.net.0.proj,ff.net.2,ff_context.net.0.proj,ff_context.net.2" \
--aspect_ratio_buckets="768,1376;1024,1024;1024,1536;1200,896;1376,768;1536,1024" \
--seed="0" \
--torch_clear_cache_step=50 \
--bnb_quantization_config_path="/nvme/fahadh/train-flux/4bit_config.json"
I'm using AdamW8bit, LR=5e-5, batch=2, resolution=1024, GRADIENT_ACCUMULATION_STEPS = 4, WEIGHTING_SCHEME = "logit_normal", LORA_RANK = 64 , LORA_ALPHA = 64, LORA_DROPOUT = 0.1, contrary to what many people are saying I'm doing this: self.lr_scheduler = get_scheduler(
"constant",
optimizer=self.optimizer,
num_warmup_steps=100,
num_training_steps=self.max_train_steps,
)
I'm going for full training. you can optionally reduce rank / alpha to 32 from 64. This setup works for me on a Flux.2 Klein base 9B training run. I set max steps to 1500 and take checkpoints every 500 steps. This allows for going back if you think you're overfitting.
