TempFlow-GRPO: When Timing Matters for GRPO in Flow Models Paper • 2508.04324 • Published Aug 6, 2025 • 11
LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent Paper • 2604.17931 • Published Apr 20 • 1
SAIL: Self-Amplified Iterative Learning for Diffusion Model Alignment with Minimal Human Feedback Paper • 2602.05380 • Published Feb 11
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 4 days ago • 1
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 4 days ago • 1
WorldOlympiad: Can Your World Model Survive a Triathlon? Paper • 2606.11129 • Published 3 days ago • 29
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published 21 days ago • 225
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization Paper • 2605.15980 • Published 28 days ago • 36
LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent Paper • 2604.17931 • Published Apr 20 • 1