AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts Paper • 2601.11044 • Published 19 days ago • 34
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 27 days ago • 218
MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization Paper • 2601.01554 • Published about 1 month ago • 57
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published Dec 29, 2025 • 65
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization Paper • 2511.15705 • Published Nov 19, 2025 • 97
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning Paper • 2507.16812 • Published Jul 22, 2025 • 63
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI Paper • 2406.12753 • Published Jun 18, 2024 • 17