Krea 2 Projector Explorations

Small, Krea-derived interpretability artifacts for Krea 2's text conditioning β€” the learned layer-mix ("multilayer feature aggregation") plus single-layer probes. Full toolkit, methods, figures, and write-up: github.com/fblissjr/krea-explorations.

Krea 2's text encoder is a frozen Qwen3-VL-4B; the DiT takes 12 selected encoder hidden-state layers [2,5,8,11,14,17,20,23,26,29,32,35] (select_layers), combines them with cross-layer attention, then a learned Linear(12 β†’ 1) projector (txtfusion.projector). That matrix is the model's own per-layer weighting β€” identical in Raw and Turbo (cosine 1.0):

layer L2 L5 L8 L11 L14 L17 L20 L23 L26 L29 L32 L35
w -0.05 -0.16 +0.37 +0.50 +0.71 +0.39 +0.40 -1.44 -0.51 -0.89 -0.61 +0.11

It combines contrastively ("mid plus, deep minus"), not as an average.

What we measured

These are characterizations of an open model's learned behavior (not architecture β€” the architecture is public); most are low-effort to reproduce. Full method + confidence levels in the GitHub repo.

  • L20 is a learned directional attention hub. In the cross-layer attention, ~91–95% of content tokens route to layer 20 β€” content-driven (not a padding artifact) and a directional effect (not a magnitude sink). Holds across 5 prompts and on both Raw and Turbo. The token-side "refiner" blocks, by contrast, are diffuse (no hub).
  • The projector-rebalance lever is a detail/intensity knob, not an attribute gate. Benign attributes (expression, "wet", blush) come through the aggregation and render with or without rebalancing; boosting the deep layers mainly shifts detail / contrast / intensity β€” consistent with the deep layers carrying fine detail.

Per-layer reweighting of Krea 2's conditioning was introduced by nova452/ComfyUI-ConditioningKrea2Rebalance and refined by huwhitememes/comfyui-krea2-conditioning.

Files

  • krea2_projector_original_weights.safetensors β€” a reference copy of the 12 learned projector weights above (the [1,12] tensor itself). Read-only reference, not a LoRA to apply.
  • solo/projector_solo_bNN_Lxx.safetensors β€” 12 diagnostic probes. Each is a projector .diff that, at strength 1, keeps one of the projector's 12 inputs and zeroes the other 11, so the DiT conditions on a single slot β€” useful to see what that slot contributes (deep slots render coherent images, shallow are noise, L14 carries text/structure, L35 alone is unusable).

Important: the projector's 12 inputs are the attention-mixed slots (output of the 2 layerwise blocks), not pristine encoder layers β€” and because the cross-layer attention routes through L20, every slot already carries L20 content. So a "solo Lx" isolates the slot indexed by layer x, not a clean layer x. These are interpretability probes, not generation LoRAs (keeping one input by design gives a partial/degraded image).

Each solo/ file is a diffusion_model.txtfusion.projector.diff patch (one [1,12] tensor, ~300 bytes), loadable via the stock LoraLoaderModelOnly β€” no custom node. (ComfyUI calls the selected layers "taps".)

License

These artifacts derive from Krea 2 and are covered by the Krea 2 Community License (see the base model linked above).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for fbjr/krea-explorations

Base model

krea/Krea-2-Raw
Adapter
(62)
this model