zhangchenxu/BV-Qwen2.5-Math-7B-deepmath_pw_lv-TinyV-1.5B-addon-2_step468 8B • Updated Jul 29, 2025 • 10
view article Article Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty? 22 days ago • 12
view article Article Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty? 22 days ago • 12
ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents Paper • 2601.12294 • Published Jan 18 • 19