Anima ControlNet-LLLite Sample Weights

Sample ControlNet-LLLite weights for the Anima-Base v1.0 image generation model, trained with anima_train_control_net_lllite.py from the sd-scripts repository.

ControlNet-LLLite is a lightweight, LoRA-like conditional control module ported to Anima's DiT (MiniTrainDIT) architecture. See the training & inference guide for full details on the v2 architecture, dataset format, and how to run inference.

An experimental ComfyUI node is also available: kohya-ss/ComfyUI-Anima-LLLite.

About earlier Preview3-based weights. Sample weights trained against the older Anima Preview3 base model — including lineart / depth / pose / fake scribble, as well as a Preview3-era any-test like and inpainting v1 — are still hosted in this repository. They are documented in the legacy model card PREVIEW3.md. The four conditioning types not retrained for Anima-Base v1.0 (lineart / depth / pose / scribble) remain available there. The Preview3 weights also work on Anima-Base v1.0 with somewhat reduced quality.

日本語

sd-scripts の anima_train_control_net_lllite.py で Anima-Base v1.0 向けに学習した、ControlNet-LLLite のサンプル重みです。アーキテクチャ、データセット形式、推論手順の詳細は学習・推論ガイドを参照してください。実験的な ComfyUI ノードも kohya-ss/ComfyUI-Anima-LLLite で公開しています。

旧 Preview3 ベースの重みについて: 旧 Anima Preview3 向けに学習した重み — lineart / depth / pose / fake scribble および Preview3 世代の any-test like / inpainting v1 — は引き続き本リポジトリで公開しています。詳細は旧モデルカード PREVIEW3.md を参照してください。Anima-Base v1.0 向けに再学習していない 4 種（lineart / depth / pose / scribble）はそちらでのみ公開しています。Preview3 版重みは Anima-Base v1.0 上でも品質はやや落ちるものの利用可能です。

Released Weights / 公開する重み

File	Type	Conditioning source
`anima-lllite-inpainting-v2.safetensors`	inpainting (4ch: RGB + mask)	Generated images with dynamic masking
`anima-lllite-any-test-like-v2.safetensors`	any-test like (mixed)	Lineart / scribble / grayscale, heavily augmented

The v2 suffix indicates the Anima-Base v1.0 generation, distinguishing these weights from the Preview3-era v1 series.

日本語

v2 は Anima-Base v1.0 世代であることを示すサフィックスで、Preview3 世代の v1 系と区別するためのものです。

Sample / サンプル

Type	Cond image	Mask image	Generated image
inpainting
any-test like		---

Common Setup / 共通設定

Base models

Anima DiT: Anima-Base v1.0 (anima-base-v1.0.safetensors)
VAE: Qwen-Image VAE
Text encoder: Qwen3-0.6B (base)

Dataset (common to all `v2` weights)

Target images: 4,000 images generated by Anima from random prompts (twice the Preview3 set).
Image composition: ~3/4 contain people (varied gender, single-person to multi-person scenes); the remaining ~1/4 are animals, landscapes, or other no-person content.
Resolution distribution (Anima-Base v1.0 supports 512² to 1536², expanded from Preview3's 512²–1024²):
- 50% (2,000 images) — 1024²-area Aspect Ratio Bucketing (same as Preview3).
- 12.5% each (500 images × 4) — 640² / 768² / 1280² / 1536²-area Aspect Ratio Bucketing.
Bucket settings: enable_bucket = true, bucket_no_upscale = true, bucket_reso_steps = 16, min_bucket_reso = 64, max_bucket_reso = 3072. The exact per-bucket resolutions are determined automatically from the source images' actual sizes.
Conditioning images: automatically generated from each target image. Generation method differs per model (see below).

Common training hyperparameters

Optimizer: adamw8bit
Mixed / save precision: bf16
Batch size: 12 (--gradient_checkpointing enabled — to accommodate the wider resolution range up to 1536²)
Seed: 42
Caption dropout: caption_dropout_rate = 0.15 (enabled to support CFG at inference time; Preview3 weights were trained without caption dropout, except for inpainting)
Caching: --cache_latents_to_disk --cache_text_encoder_outputs_to_disk
Attention backend: --attn_mode flash
Hardware: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition

Difference from Preview3 common setup. Preview3 used batch size 6 with --gradient_checkpointing disabled at 1024²-only resolutions. For Anima-Base v1.0, the resolution range now extends to 1536², so the common setup uses batch size 12 with --gradient_checkpointing enabled. Per-model LLLite dims (--cond_emb_dim, --lllite_cond_dim, --lllite_mlp_dim) are set per-model rather than at a shared default.

日本語

ベースモデル: Anima DiT (Anima-Base v1.0 / anima-base-v1.0.safetensors) / Qwen-Image VAE / Qwen3-0.6B (base)
対象画像: Anima でランダムプロンプトで生成した 4,000 枚（Preview3 の倍）。人物約 3/4、残りは動物・風景など。
解像度分布（Anima-Base v1.0 は Preview3 の 512²–1024² から 512²–1536² へサポート拡大）:
- 50%（2,000 枚） — 1024² 面積 ARB（Preview3 と同じ）。
- 各 12.5%（500 枚 × 4） — 640² / 768² / 1280² / 1536² 面積 ARB。
バケット設定: enable_bucket = true、bucket_no_upscale = true、bucket_reso_steps = 16、min_bucket_reso = 64、max_bucket_reso = 3072。実バケット解像度は元画像から自動算出。
共通ハイパーパラメータ: adamw8bit、bf16、batch size 12（--gradient_checkpointing 有効、解像度上限 1536² に対応するため）、seed 42、caption_dropout_rate 0.15（推論時の CFG 対応のため有効化。Preview3 重みは inpainting を除き caption dropout なしで学習）、latent / TE 出力ともディスクキャッシュ、--attn_mode flash。
ハード: RTX PRO 6000 Blackwell Max-Q Workstation Edition。

Preview3 との差分: Preview3 は 1024² までで batch 6・gradient checkpointing 無効でしたが、Anima-Base v1.0 は 1536² まで扱う関係で batch 12・gradient checkpointing 有効に変更。LLLite 次元（--cond_emb_dim / --lllite_cond_dim / --lllite_mlp_dim）は共通デフォルトを置かず、モデルごとに設定します。

Per-Model Details / 各モデルの詳細

1. inpainting (4ch conditioning)

Conditioning: 4-channel — RGB image with the masked region blacked out, concatenated with a 1-channel binary mask. Trained with --lllite_cond_in_channels 4 --lllite_inpaint_masked_input.
Generation method: dynamic mask generation (same procedure as Preview3 inpainting v1). For each training step, a mask is generated on the fly from the target image.
Pairs: 4,000 (all target images).
Per-model hyperparameters:
- --learning_rate 1e-3
- --timestep_sampling shift --discrete_flow_shift 3.0
- --lllite_target_layers self_attn_q_pre,self_attn_kv_pre,mlp_fc1_pre (Q, K/V, and MLP fc1 all injected — wider injection than the other models)
- --lllite_cond_resblocks 4
- --cond_emb_dim 64 --lllite_cond_dim 128 --lllite_mlp_dim 64 (larger dims than the Preview3 common setup of 32/32/32 — unlike the other ControlNet types, inpainting must preserve the color information of the unmasked region of the input image, which requires more capacity)
- --lllite_cond_in_channels 4 --lllite_inpaint_masked_input
- --max_train_epochs 64 configured. Published weight: epoch 30 (≈ 10,770 steps), saved as anima-lllite-inpainting-v2.safetensors.
- Wall-clock: ~17 hours 30 minutes on the hardware listed above. Loss was still trending down at the stopping point, but the model already produced clearly effective results, and continuing risked overfitting to non-essential details — so training was stopped here.

Usage notes / 利用上の注意

It is recommended to use img2img with a mask (otherwise the colors may shift slightly). The sample image embeds the workflow.
Please update the Anima LLLite ComfyUI node to the latest version.

日本語

conditioning: 4 チャンネル — マスク領域を黒で塗りつぶした RGB 画像と、1 チャンネルの 2 値マスクの結合。--lllite_cond_in_channels 4 --lllite_inpaint_masked_input で学習しています。
マスク生成: 学習ステップごとに動的にマスクを生成（Preview3 inpainting v1 と同じ手順）。
ペア数: 4,000（教師画像すべて）。
ハイパーパラメータ:
- 学習率 1e-3、--timestep_sampling shift --discrete_flow_shift 3.0
- target self_attn_q_pre,self_attn_kv_pre,mlp_fc1_pre（Q / K/V / MLP fc1 すべて、他モデルより広め）
- --lllite_cond_resblocks 4
- --cond_emb_dim 64 --lllite_cond_dim 128 --lllite_mlp_dim 64（Preview3 共通設定の 32/32/32 より大きく設定。他の ControlNet と異なり、inpainting では入力画像のマスク領域外の色情報を保持する必要があるため、より大きい容量が必要）
- --lllite_cond_in_channels 4 --lllite_inpaint_masked_input
- --max_train_epochs 64 設定。公開重みは epoch 30（≈ 10,770 step）時点で、ファイル名は anima-lllite-inpainting-v2.safetensors。
- 学習時間: 上記ハードウェアで約 17 時間 30 分。停止時点でも loss はまだ下がり続けていましたが、既に十分効くレベルに達しており、これ以上続けると本質的でない細部に過学習する懸念があったため、ここで打ち切りました。

利用上の注意

img2img でマスクを併用することを推奨します（併用しないと色味が微妙に変化することがあります）。サンプル画像が workflow を含んでいます。
ComfyUI の Anima LLLite ノードを最新版に更新してください。

2. any-test like (3ch conditioning)

Conditioning: 3-channel RGB. Mix of five conditioning types — HED scribble, PiDiNet scribble, Grayscale A, Grayscale B, and lineart — all heavily augmented. (Default --lllite_cond_in_channels 3, no --lllite_inpaint_masked_input.)
Generation method: each of the five conditioning types is generated for every target image, giving 20,000 conditioning images in total:
- HED scribble / PiDiNet scribble / Grayscale A / Grayscale B — same generation pipeline, augmentation, and additional post-processing as Preview3-era any-test like v1 (see PREVIEW3.md).
- lineart — extracted via a random mix of AniLines (basic / detail) and MangaLineExtraction at a sampling ratio of 1 : 1 : 1.5. For each image, the script randomly combines input preprocessing (scale / blur / contrast / gamma), model-internal parameters (sharpness / sobel ksize), post-processing (threshold / two-step / tone curve), morphology (thicken / none), and final blur / contrast / brightness / invert. The same lineart-specific additional augmentation used for Preview3 v1 is then applied on top. The Preview3 v1 lineart extractor was replaced with this mix because it was too slow at the 4,000-image scale.
Pairs: 20,000 (4,000 target images × 5 conditioning types).
Per-model hyperparameters:
- --timestep_sampling shift --discrete_flow_shift 3.0
- --lllite_target_layers self_attn_q_pre (Q only — narrower injection than inpainting)
- --lllite_cond_resblocks 6
- --cond_emb_dim 32 --lllite_cond_dim 64 --lllite_mlp_dim 64
- --max_train_epochs 32 configured. Trained in two phases:
  - Phase 1 (Blackwell RTX PRO 6000 Max-Q, common setup): --learning_rate 1e-3, batch size 12, 6 epochs (≈ 11,000 steps), ~17 hours. Loss stopped decreasing around epoch 6, so training was continued on a separate machine.
  - Phase 2 (RTX A6000 × 2): --learning_rate 3e-4, batch size 8 per GPU × 2 GPUs (effective batch size 16), 6 epochs (≈ 9,000 steps), ~50 hours. Loss continued to decrease for the first ~2 epochs of this phase, then plateaued; training was stopped at epoch 6.
- Published weight: cumulative epoch 12 (≈ 20,000 steps), saved as anima-lllite-any-test-like-v2.safetensors.

日本語

conditioning: 3 チャンネル RGB。HED scribble / PiDiNet scribble / Grayscale A / Grayscale B / lineart の 5 種類を混合（いずれも強めの augmentation 付き）。--lllite_cond_in_channels はデフォルト (3)、--lllite_inpaint_masked_input なし。
生成方法: 5 種類それぞれを 4,000 枚すべての教師画像について生成し、合計 20,000 枚:
- HED scribble / PiDiNet scribble / Grayscale A / Grayscale B — Preview3 any-test like v1 と同じ生成パイプライン・augmentation・追加加工（詳細は PREVIEW3.md）。
- lineart — AniLines (basic / detail) と MangaLineExtraction をサンプリング比 1 : 1 : 1.5 でランダムに使用。1 枚ごとに入力前処理（scale / blur / contrast / gamma）、モデル内部パラメータ（sharpness / sobel ksize）、後処理（threshold / two-step / tone curve）、モルフォロジー（thicken / none）、最終 blur / contrast / brightness / invert をランダムに組み合わせて生成。生成後に Preview3 v1 と同じ lineart 用追加 augmentation を適用。Preview3 v1 で使用していた線画抽出モデルは 4,000 枚規模では遅すぎたため、この組み合わせに置き換えました。
ペア数: 20,000（教師画像 4,000 枚 × 5 種）。
ハイパーパラメータ:
- --timestep_sampling shift --discrete_flow_shift 3.0
- target self_attn_q_pre（Q のみ。inpainting より狭め）
- --lllite_cond_resblocks 6
- --cond_emb_dim 32 --lllite_cond_dim 64 --lllite_mlp_dim 64
- --max_train_epochs 32 設定。2 段階で学習:
  - Phase 1（Blackwell RTX PRO 6000 Max-Q、共通設定どおり）: 学習率 1e-3、batch size 12、6 epoch（≈ 11,000 step）、約 17 時間。epoch 6 付近で loss が下がらなくなったため別 PC で継続学習。
  - Phase 2（RTX A6000 × 2 の 2GPU 学習）: 学習率 3e-4、batch size 8/GPU × 2 GPU（実効 batch 16）、6 epoch（≈ 9,000 step）、約 50 時間。最初の 2 epoch 程度までは loss が下がりましたが、それ以降は顕著に下がらなくなり 6 epoch で打ち切り。
- 公開重み: 累計 epoch 12（≈ 20,000 step）、ファイル名は anima-lllite-any-test-like-v2.safetensors。

Usage / 使い方

See the inference section of the training guide for anima_minimal_inference_control_net_lllite.py. Architecture metadata is embedded in each .safetensors, so you normally only need to point --lllite_weights at the file and pass a --control_image.

An experimental ComfyUI node is available at kohya-ss/ComfyUI-Anima-LLLite.

License / ライセンス

These weights follow the same license as the Anima base model. Please refer to the Anima model card for terms of use.

A copy of the CircleStone Labs Non-Commercial License is included in this repository as LICENSE.

日本語

本重みのライセンスは Anima 本体に準拠します。利用条件については Anima 本体のモデルカードを参照してください。

CircleStone Labs Non-Commercial License のコピーはこのリポジトリの LICENSE として同梱しています。

Credits / クレジット

ControlNet-LLLite (original SDXL implementation) and Anima port — kohya-ss.
Lineart extraction models used for any-test like v2 conditioning generation:
- AniLines (Anime Lineart Extractor) by zhenglin (MIT License)
- MangaLineExtraction (PyTorch) by Chengze "Miaomiao" Li (MIT License)
Preview3-era weights (lineart / depth / pose / fake scribble / any-test like v1 / inpainting v1) and their data-pipeline credits — see PREVIEW3.md.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kohya-ss/Anima-LLLite

Base model

circlestone-labs/Anima

Adapter

(24)

this model

kohya-ss
/

Anima-LLLite

Anima ControlNet-LLLite Sample Weights

Released Weights / 公開する重み

Sample / サンプル

Common Setup / 共通設定

Base models

Dataset (common to all `v2` weights)

Common training hyperparameters

Per-Model Details / 各モデルの詳細

1. inpainting (4ch conditioning)

Usage notes / 利用上の注意

利用上の注意

2. any-test like (3ch conditioning)

Usage / 使い方

License / ライセンス

Credits / クレジット

Model tree for kohya-ss/Anima-LLLite

Space using kohya-ss/Anima-LLLite 1

Anima ControlNet-LLLite Sample Weights

Released Weights / 公開する重み

Sample / サンプル

Common Setup / 共通設定

Base models

Dataset (common to all v2 weights)

Common training hyperparameters

Per-Model Details / 各モデルの詳細

1. inpainting (4ch conditioning)

Usage notes / 利用上の注意

利用上の注意

2. any-test like (3ch conditioning)

Usage / 使い方

License / ライセンス

Credits / クレジット

Model tree for kohya-ss/Anima-LLLite

Space using kohya-ss/Anima-LLLite 1

Dataset (common to all `v2` weights)