A collection of training corpus and models for "Multilingual Language Model Pretraining using Machine-translated Data".
BritLLM
community
AI & ML interests
contact@llm.org.uk
datasets 18
britllm/TransWebEdu
Updated
• 1.72k • 2
britllm/TransWeb-Edu-English
Viewer
• Updated
• 36M • 1.23k
britllm/TransWeb-Edu-Spanish
Viewer
• Updated
• 35.2M • 530 • 3
britllm/TransWeb-Edu-French
Viewer
• Updated
• 36M • 342
britllm/TransWeb-Edu-German
Viewer
• Updated
• 36M • 615 • 1
britllm/xnli_brit
Viewer
• Updated
• 9.69k • 6
britllm/piqa_scottish_gaelic
Updated
• 6
britllm/piqa_welsh
Updated
• 5
britllm/piqa_irish
Updated
• 7
britllm/arc_scottish_gaelic
Viewer
• Updated
• 7.56k • 17