counterfact-plantain

A per-head attention probe of FLUX.2 Klein 4B testing whether the base model represents counterfactual modal frame as a separable axis on identical actual outcomes.

Thesis

after-plantain established that ~1% of Klein's heads represent post-event states as a categorical concept. counterfact-plantain extends the question upstream from "did this event happen" to "is this description framed factually or counterfactually." The factual A condition and the counterfactual B condition describe the same actual outcome — the distinguishing variable is purely the modal frame ("would not have, had ..."). If a per-head signal exceeds the empirical null on this stimulus set, image-generation pretraining encodes counterfactual structure as a separable axis, which is the load-bearing primitive of any genuine world model.

Method

Twenty-five paired prompts. The A condition is purely descriptive ("the ball rolled left across the tilted table"). The B condition adds an explicit counterfactual conditional with the same actual content ("the ball rolled left across the tilted table; it would not have, without the tilt"). Pairs span physical, thermodynamic, biological, and mechanical causation. Within-pair length is matched. The "as expected"/"contrary to expectations" framing is deliberately avoided to prevent confounds with vocabulary-frequency priors.

Per-head capture identical to the rest of the plantain probe family: forward pre-hook on every transformer block's attention output projection, per-head RMS magnitude, one inference step at guidance_scale=1.0, fixed seed. Across the 25 pairs, per-head paired t-statistics are computed on (factual − counterfactual) magnitudes. Empirical null is 1,000 sign-flip permutations.

Rigor add-ons: per-head Cohen's d effect size; split-half consistency via 100 random 50/50 stimulus splits.

Results

Metric	Value	Significance
Heads with \|t\| > 3	3,469 (21.3%)	5.9× empirical null p99
Heads with \|t\| > 5	835 (5.1%)	167× empirical null p99
Heads with \|d\| > 0.8 (large)	1,718 (10.5%)	—
Split-half r (median)	0.639	[0.61, 0.65] IQR
Max \|t\|	13.63	—

Top blocks by max |t|:

single[19]: max|t|=13.63, 539/768 heads at |t|>3, median |d|=0.92
single[0]: max|t|=11.74, 401/768 heads at |t|>3, median |d|=0.63
joint[0]: max|t|=11.27, 137/192 heads at |t|>3, median |d|=0.90
single[8]: max|t|=11.01, 239/768 heads at |t|>3, median |d|=0.47
single[13]: max|t|=10.97, 173/768 heads at |t|>3, median |d|=0.34

Interpretation. The axis is real and stable across split halves (r=0.64). Localization is bookend — strongest signal in single[0] (input-adjacent) and single[19] (output-adjacent) — suggesting the counterfactual frame is detected early during text-conditioning and re-engaged late during the diffusion-output projection. The deep single[19] block alone has 539 of 768 heads at |t|>3 with median Cohen's d near 0.9, indicating the counterfactual-vs-factual distinction is a load-bearing partition for that block's representation. Image generation pretraining contains a counterfactual primitive that is structurally separable from the underlying factual content.

Status

Probe complete. No LoRA training; this is a base-model interpretability finding.

Limitations

The counterfactual condition contains an additional clause ("it would not have, had ...") that the factual condition does not. Although within-pair length is matched, the residual signal could partly reflect "presence of secondary clause" rather than counterfactual structure specifically. A follow-up that contrasts counterfactual conditionals against factual conditionals of matched grammatical complexity (e.g., chained "because" clauses) would tighten the claim.

Twenty-five pairs is small; the empirical null is a 1,000-permutation baseline.

The probe is correlational. Heads with high |t| are sensitive to the counterfactual framing in input; whether they participate causally in counterfactual-conditioned generation is a follow-up.

License

Apache 2.0 — matches base FLUX.2 Klein 4B.

References

Gabeur, V., Long, S., Peng, S., et al. Image Generators are Generalist Vision Learners. arXiv:2604.20329 (2026).
Black Forest Labs. FLUX.2 Klein. https://bfl.ai/models/flux-2-klein (2025).

Downloads last month: -

Model tree for phanerozoic/counterfact-plantain

Base model

black-forest-labs/FLUX.2-klein-base-4B

Finetuned

(14)

this model

Paper for phanerozoic/counterfact-plantain

Image Generators are Generalist Vision Learners

Paper • 2604.20329 • Published 15 days ago • 19