VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos Paper • 2506.10857 • Published Jun 12, 2025 • 30
SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation Paper • 2601.14615 • Published Jan 21 • 1
VisionDirector: Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis Paper • 2512.19243 • Published Dec 22, 2025 • 1
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking Paper • 2303.11301 • Published Mar 20, 2023
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks Paper • 2401.14159 • Published Jan 25, 2024 • 6
FocalFormer3D : Focusing on Hard Instance for 3D Object Detection Paper • 2308.04556 • Published Aug 8, 2023 • 10
Mask-Attention-Free Transformer for 3D Instance Segmentation Paper • 2309.01692 • Published Sep 4, 2023 • 1
Focal Sparse Convolutional Networks for 3D Object Detection Paper • 2204.12463 • Published Apr 26, 2022
RL-GPT: Integrating Reinforcement Learning and Code-as-policy Paper • 2402.19299 • Published Feb 29, 2024 • 2
MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models Paper • 2406.13975 • Published Jun 20, 2024
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19, 2024 • 52
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO Paper • 2505.13031 • Published May 19, 2025 • 4
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation Paper • 2507.18537 • Published Jul 24, 2025 • 18