VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos Paper • 2506.10857 • Published Jun 12, 2025 • 30
SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation Paper • 2601.14615 • Published Jan 21
VisionDirector: Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis Paper • 2512.19243 • Published Dec 22, 2025