Reward Hacking in Reasoning Models - a AIML-TUDA Collection

AIML-TUDA 's Collections

Reward Hacking in Reasoning Models

Scalable Logical Reasoning

How to Train your Text‑to‑Image Model

Reward Hacking in Reasoning Models

updated 4 days ago

Do reasoning LLMs actually reason — or learn to game the test? IPT allows for detecting reward hacking in inductive programming tasks (SLR-Bench).

Running

Agents

1

Isomorphic Perturbation Testing

🔍

1

Evaluate rule hypotheses for genuine reasoning vs shortcuts
AIML-TUDA/SLR-Bench

Viewer • Updated about 1 hour ago • 38.5k • 1.42k • 4
Running

Agents

1

SLR-Bench Leaderboard - Reward Hacking in Reasoning Models

🎯

1

Reward shortcut behavior in LLMs via IPT
LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking

Paper • 2604.15149 • Published Apr 16 • 1