MixMinMatch Collection of datasets from MixMinMatch work. Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4 AdaMLLab/AraMix Viewer • Updated Jan 30 • 394M • 1.32k • 5 AdaMLLab/TurMix Viewer • Updated Jan 30 • 681M • 959 • 4 AdaMLLab/HinMix Viewer • Updated Jan 30 • 179M • 1.09k • 1
Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4
MixMinMatch Collection of datasets from MixMinMatch work. Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4 AdaMLLab/AraMix Viewer • Updated Jan 30 • 394M • 1.32k • 5 AdaMLLab/TurMix Viewer • Updated Jan 30 • 681M • 959 • 4 AdaMLLab/HinMix Viewer • Updated Jan 30 • 179M • 1.09k • 1
Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4