SurveyBench: How Well Can LLM(-Agents) Write Academic Surveys? Paper • 2510.03120 • Published Oct 3, 2025 • 7
CoDA: Agentic Systems for Collaborative Data Visualization Paper • 2510.03194 • Published Oct 3, 2025 • 30
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables Paper • 2508.19813 • Published Aug 27, 2025 • 28
view article Article Introducing AI Sheets: a tool to work with datasets using open AI models! +4 Aug 8, 2025 • 108
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper • 2508.20453 • Published Aug 28, 2025 • 63