Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use Paper • 2605.14038 • Published 8 days ago • 12
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing Paper • 2605.02910 • Published 15 days ago • 22
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing Paper • 2605.02910 • Published 15 days ago • 22
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration? Paper • 2603.03202 • Published Mar 3 • 17
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind Paper • 2601.15715 • Published Jan 22 • 14
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published Jan 16 • 30
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published Jan 16 • 30
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published Jan 16 • 30
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Paper • 2601.07264 • Published Jan 12 • 24
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration? Paper • 2510.24505 • Published Oct 28, 2025 • 4
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration? Paper • 2510.24505 • Published Oct 28, 2025 • 4
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration? Paper • 2510.24505 • Published Oct 28, 2025 • 4 • 2