HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios Paper • 2603.11975 • Published 8 days ago • 11
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper • 2602.17684 • Published Feb 4 • 22
Query as Anchor: Scenario-Adaptive User Representation via Large Language Model Paper • 2602.14492 • Published Feb 16 • 18
MOVA: Towards Scalable and Synchronized Video-Audio Generation Paper • 2602.08794 • Published Feb 9 • 156
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published Feb 5 • 60
HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing Paper • 2601.21459 • Published Jan 29 • 10
TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents Paper • 2602.02196 • Published Feb 2 • 35
SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibration Paper • 2602.02419 • Published Feb 2 • 4
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published Jan 29 • 155
SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization Paper • 2601.22491 • Published Jan 30 • 12
Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning Paper • 2601.20209 • Published Jan 28 • 23
A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation Paper • 2601.09274 • Published Jan 14 • 85
Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning Paper • 2601.03872 • Published Jan 7 • 44
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices Paper • 2512.14052 • Published Dec 16, 2025 • 42
From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks Paper • 2512.02580 • Published Dec 2, 2025 • 28
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints Paper • 2510.08549 • Published Oct 9, 2025 • 7
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities Paper • 2505.15692 • Published May 21, 2025 • 14
DReSS: Data-driven Regularized Structured Streamlining for Large Language Models Paper • 2501.17905 • Published Jan 29, 2025 • 2