Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
Abstract
Large reasoning models exhibit spontaneous question repetition patterns that can be formalized and leveraged to improve computational efficiency and accuracy through echo-aware training and prompting techniques.
Test-time compute allocation in large reasoning models (LRMs) is widely used and has applications in mathematical problem solving, code synthesis, and planning. Recent work has addressed this problem by scaling self-consistency and parallel thinking, adding generic ``thinking tokens'' and prompting models to re-read the question before answering. Unfortunately, these approaches either inject task-agnostic tokens or mandate heuristics that do not explain -- and often ignore -- the spontaneous repetition that many LRMs exhibit at the head of their internal chains. In contrast, we analyze and harness the model's tendency to restate the question, which we term the Echo of Prompt (EOP), as a front-loaded, compute-shaping mechanism. We formalize its probabilistic cost by casting echo removal as rejection-based conditioning and defining the Echo Likelihood Gap ΔL as a computable proxy. This provides the missing theoretical link that links early repetition to likelihood gains and downstream accuracy. However, it does not by itself specify how to exploit EOP. Consequently, we develop Echo-Distilled SFT (ED-SFT) to instill an ``echo-then-reason'' pattern through supervised finetuning, and Echoic Prompting (EP) to re-ground the model mid-trace without training. While promising, quantifying benefits beyond verbosity is non-trivial. Therefore, we conduct length and suffix-controlled likelihood analyses together with layer-wise attention studies, showing that EOP increases answer to answer-prefix attention in middle layers, consistent with an attention refocusing mechanism. We evaluate on GSM8K, MathQA, Hendrycks-MATH, AIME24, and MATH-500 under identical decoding settings and budgets, and find consistent gains over baselines. Code is available at https://github.com/hhh2210/echoes-as-anchors.
Community
Tracing the spontaneous “Echo” phenomenon—where models repeat the user query—we link its emergence to the evolution of CoT and RLVR. Through probabilistic and Attention analyses, we show that Echo functions as an effective attention anchor, and demonstrate that leveraging it via SFT and prompting yields substantial gains in mathematical reasoning.
我们将推理模型中普遍存在的自发“回声”现象(复述用户问题)追溯至 CoT 与 RLVR 的演进。通过概率与Attention 分析,验证其作为有效注意力锚点的作用,并表明通过 SFT 与 Prompting 加以利用,可显著提升数学推理能力。
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper