SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions
Paper • 2606.30573 • Published • 5
None defined yet.
SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions
Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR