Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates Paper • 2402.18540 • Published Feb 28, 2024
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction Paper • 2508.03613 • Published Aug 5, 2025 • 16
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities Paper • 2505.12680 • Published May 19, 2025
AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs? Paper • 2507.15887 • Published Jul 19, 2025
Contextual Drag: How Errors in the Context Affect LLM Reasoning Paper • 2602.04288 • Published Feb 4 • 2
Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification Paper • 2603.19329 • Published Mar 18 • 1
AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms Paper • 2602.09464 • Published Feb 10
Task-Specific Skill Localization in Fine-tuned Language Models Paper • 2302.06600 • Published Feb 13, 2023