From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation Paper • 2602.02536 • Published 19 days ago • 3
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes? Paper • 2506.14805 • Published Jun 3, 2025 • 3
A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos Paper • 2502.15806 • Published Feb 19, 2025 • 2
A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports Paper • 2510.02190 • Published Oct 2, 2025 • 19
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes? Paper • 2506.14805 • Published Jun 3, 2025 • 3
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published Dec 27, 2024 • 87
Reflection-Bench: probing AI intelligence with reflection Paper • 2410.16270 • Published Oct 21, 2024 • 6
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models Paper • 2406.14952 • Published Jun 21, 2024
Reflection-Bench: probing AI intelligence with reflection Paper • 2410.16270 • Published Oct 21, 2024 • 6 • 2
Reflection-Bench: probing AI intelligence with reflection Paper • 2410.16270 • Published Oct 21, 2024 • 6