Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control
Abstract
Test-Time Control layer integrates optimal control theory into language models for enhanced reasoning, using LQR planning and hardware-efficient solvers to improve mathematical problem-solving performance.
Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require but do not natively encode. While prior work uses reinforcement learning or test-time training, planning remains external to the model architecture. We formulate reasoning as optimal control and introduce the Test-Time Control (TTC) layer, which performs finite-horizon LQR planning over latent states at inference time, represents a value function within neural architectures, and leverages it as the nested objective to enable planning before prediction. To ensure scalability, we derive a hardware-efficient LQR solver based on a symplectic formulation and implement it as a fused CUDA kernel, enabling parallel execution with minimal overhead. Integrated as an adapter into pretrained LLMs, TTC layers improve mathematical reasoning performance by up to +27.8% on MATH-500 and 2-3x Pass@8 improvements on AMC and AIME, demonstrating that embedding optimal control as an architectural component provides an effective and scalable mechanism for reasoning beyond test-time training.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Training Large Reasoning Models Efficiently via Progressive Thought Encoding (2026)
- $\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space (2026)
- Causal World Modeling for Robot Control (2026)
- Learning Adaptive LLM Decoding (2026)
- ReasonCACHE: Teaching LLMs To Reason Without Weight Updates (2026)
- Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning (2026)
- Environment-Aware Adaptive Pruning with Interleaved Inference Orchestration for Vision-Language-Action Models (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper