Spaces:
Running
[Analysis] Why Gemma Feels "Chaotic": A Geometric Audit of Trajectory Instability & The 84% Reopening Rate
Hi Gemma Community,
I’ve been conducting a large-scale methodological audit of trajectory instability across 17 open-source models (including Gemma-2B, Llama-3.2, Qwen-2.5, etc.), and I wanted to share some specific findings regarding Gemma’s internal dynamics.
While Qwen models tend to cluster in a stable "Adaptive" regime, Gemma-2B often exhibits a "High-Spike" or "Chaotic" dynamic profile in our amplitude-based metrics. But here is the nuance: this isn't necessarily a flaw—it's a structural feature of how it processes uncertainty.
Key Findings for Gemma Users:
The "Chaotic" Signature: In our panel, Gemma-2B showed higher ratio_norm values (indicating larger spikes in hidden-state amplitude) compared to Qwen or Phi. This correlates with its tendency to explore more diverse token paths before converging.
The 84% Reopening Rate: We identified a robust COLLAPSE-RIVALRY cycle at the token level. When Gemma enters a low-entropy "Collapse" state (high certainty), it returns to a high-entropy "Rivalry" state (re-evaluating options) 84% of the time. This suggests Gemma is constantly self-correcting, which can look like hesitation but is actually a rigorous verification loop.
Prompt Sensitivity Matters: Our variance decomposition shows that 17% of dynamic variance comes from prompt category. Gemma’s instability spikes significantly on scientific_reasoning prompts but stabilizes on factual_easy tasks. If you’re seeing erratic outputs, check if the prompt requires multi-hop reasoning—Gemma’s geometry changes drastically under cognitive load.
The Methodological Audit:
I’ve published a full working paper detailing these findings, including:
Why normalisation choices (CLIPPED_MAD vs RAW) change how we see Gemma’s stability.
Why "families" of models dissolve at scale (n=17).
6 documented falsifications of common stability hypotheses.
Read the full audit here: //doi.org/10.5281/zenodo.20361289
See the complementary "4 Regimes" paper here: https://doi.org/10.5281/zenodo.20348878
Practical Tip:
If you’re using Gemma for complex reasoning, consider monitoring its logit entropy. High entropy spikes often precede a "Reopening" event. Using a slight temperature adjustment during these spikes might help guide its self-correction process.
Would love to hear if other users have noticed this "self-correcting" hesitation in Gemma’s outputs!
Best,
Jean-Denis Bosange Batuli
IDChain SRL (Unbind)