evaluation-datasets edinburgh-dawg/mmlu-redux-2.0 Viewer • Updated Feb 25, 2025 • 5.7k • 6.57k • 35 TIGER-Lab/MMLU-Pro Benchmark • Updated Jan 19 • 12.1k • 83.3k • 431 CohereLabs/Global-MMLU Viewer • Updated Aug 14, 2025 • 602k • 14.5k • 150 Idavidrein/gpqa Benchmark • Updated 29 days ago • 1.25k • 87.2k • 366
smoltalk Contains smoltalk dataset in multiple minority languges. The dataset is useful in post-training a base model. rao254/smoltalk-kik Updated Nov 25, 2025 • 1 rao254/smoltalk-ja Viewer • Updated Nov 25, 2025 • 2.05k • 6
medical-datasets ruslanmv/ai-medical-chatbot Viewer • Updated Mar 23, 2024 • 257k • 1.21k • 245 michsethowusu/Code-170k-luo Viewer • Updated Oct 30, 2025 • 169k • 24 edinburgh-dawg/mmlu-redux-2.0 Viewer • Updated Feb 25, 2025 • 5.7k • 6.57k • 35
evaluation-datasets edinburgh-dawg/mmlu-redux-2.0 Viewer • Updated Feb 25, 2025 • 5.7k • 6.57k • 35 TIGER-Lab/MMLU-Pro Benchmark • Updated Jan 19 • 12.1k • 83.3k • 431 CohereLabs/Global-MMLU Viewer • Updated Aug 14, 2025 • 602k • 14.5k • 150 Idavidrein/gpqa Benchmark • Updated 29 days ago • 1.25k • 87.2k • 366
medical-datasets ruslanmv/ai-medical-chatbot Viewer • Updated Mar 23, 2024 • 257k • 1.21k • 245 michsethowusu/Code-170k-luo Viewer • Updated Oct 30, 2025 • 169k • 24 edinburgh-dawg/mmlu-redux-2.0 Viewer • Updated Feb 25, 2025 • 5.7k • 6.57k • 35
smoltalk Contains smoltalk dataset in multiple minority languges. The dataset is useful in post-training a base model. rao254/smoltalk-kik Updated Nov 25, 2025 • 1 rao254/smoltalk-ja Viewer • Updated Nov 25, 2025 • 2.05k • 6