MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning Paper • 2603.16929 • Published 22 days ago • 13
GigaWorld-Policy: An Efficient Action-Centered World--Action Model Paper • 2603.17240 • Published 18 days ago • 25
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning Paper • 2603.14482 • Published 20 days ago • 26
AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents Paper • 2603.16496 • Published 18 days ago • 13 • 3
AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents Paper • 2603.16496 • Published 18 days ago • 13
Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models Paper • 2603.13985 • Published 21 days ago • 10
RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting Paper • 2603.14941 • Published 20 days ago • 8
Neural Field Thermal Tomography: A Differentiable Physics Framework for Non-Destructive Evaluation Paper • 2603.11045 • Published 24 days ago • 2
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation Paper • 2603.12247 • Published 23 days ago • 23
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published 23 days ago • 91
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR Paper • 2603.10101 • Published 25 days ago • 5
AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery Paper • 2603.07300 • Published 28 days ago • 17
π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs Paper • 2603.02083 • Published Mar 2 • 9