CU-Benchmarks
updated
visualwebbench/VisualWebBench
Viewer
•
Updated
•
1.54k
•
1.01k
•
18
Updated
•
65
•
6
rootsautomation/RICO-ScreenQA
Viewer
•
Updated
•
86k
•
170
•
11
rootsautomation/ScreenSpot
Viewer
•
Updated
•
1.27k
•
1.07k
•
44
Viewer
•
Updated
•
1.27k
•
879
•
8
Viewer
•
Updated
•
1.59k
•
3.93k
•
44
Preview
•
Updated
•
1.57k
•
15
Preview
•
Updated
•
3.95k
•
25
Viewer
•
Updated
•
168k
•
176
•
5
Preview
•
Updated
•
12
osunlp/Multimodal-Mind2Web
Viewer
•
Updated
•
14.2k
•
2.9k
•
91
Viewer
•
Updated
•
259
•
66
•
2
Viewer
•
Updated
•
253
•
3.05k
•
123
Viewer
•
Updated
•
7.74k
•
5.83k
•
26
xlangai/ubuntu_osworld_file_cache
Updated
•
404k
•
3
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
•
2409.08264
•
Published
•
48
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Paper
•
2405.14573
•
Published
Viewer
•
Updated
•
1.21k
•
38
•
5