cvpr

classroom

AI & ML interests

None defined yet.

Recent Activity

TruemanV5 authored a paper about 1 month ago

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

TruemanV5 authored a paper about 1 month ago

SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation

TruemanV5 authored a paper about 1 month ago

VisionDirector: Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis

View all activity

authored 3 papers about 1 month ago

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Paper • 2506.10857 • Published Jun 12, 2025 • 30

SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation

Paper • 2601.14615 • Published Jan 21 • 1

VisionDirector: Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis

Paper • 2512.19243 • Published Dec 22, 2025 • 1

authored 17 papers 6 months ago

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

Paper • 2303.11301 • Published Mar 20, 2023

Spherical Transformer for LiDAR-based 3D Recognition

Paper • 2303.12766 • Published Mar 22, 2023

Denoising Diffusion Step-aware Models

Paper • 2310.03337 • Published Oct 5, 2023 • 1

LISA: Reasoning Segmentation via Large Language Model

Paper • 2308.00692 • Published Aug 1, 2023 • 1

LISA: Reasoning Segmentation via Large Language Model

Paper • 2308.00692 • Published Aug 1, 2023 • 1

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

Paper • 2401.14159 • Published Jan 25, 2024 • 6

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

Paper • 2308.04556 • Published Aug 8, 2023 • 10

Mask-Attention-Free Transformer for 3D Instance Segmentation

Paper • 2309.01692 • Published Sep 4, 2023 • 1

Focal Sparse Convolutional Networks for 3D Object Detection

Paper • 2204.12463 • Published Apr 26, 2022

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

Paper • 2402.19299 • Published Feb 29, 2024 • 2

MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

Paper • 2406.13975 • Published Jun 20, 2024

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 52

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 60

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Paper • 2505.13031 • Published May 19, 2025 • 4

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 162

TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation

Paper • 2507.18537 • Published Jul 24, 2025 • 18

3D Aware Region Prompted Vision Language Model

Paper • 2509.13317 • Published Sep 16, 2025 • 14