Pretraining Data
updated
opencsg/Fineweb-Edu-Chinese-V2.1
Viewer
•
Updated
•
958M
•
55.8k
•
63
Preview
•
Updated
•
142k
•
29
Viewer
•
Updated
•
3.8B
•
14.1k
•
106
allenai/dolma3_dolmino_pool
Updated
•
87.9k
•
7
allenai/dolma3_longmino_pool
Updated
•
49.4k
•
10
Viewer
•
Updated
•
476M
•
34.4k
•
817
Viewer
•
Updated
•
4.48B
•
72.4k
•
754
Viewer
•
Updated
•
61.6M
•
6.33k
•
284
Viewer
•
Updated
•
819M
•
53.8k
•
11
tokyotech-llm/swallow-code-v2
Viewer
•
Updated
•
147M
•
174k
•
31
ByteDance-Seed/Code-Contests-Plus
Viewer
•
Updated
•
49.2k
•
26.1k
•
60
Viewer
•
Updated
•
7.09M
•
5.08k
•
158
nvidia/Nemotron-Pretraining-Code-v2
Viewer
•
Updated
•
836M
•
3.36k
•
103
nvidia/Nemotron-Pretraining-Specialized-v1
Viewer
•
Updated
•
60.7M
•
3.89k
•
69
nvidia/Nemotron-CC-Math-v1
Viewer
•
Updated
•
190M
•
3.52k
•
66
nvidia/Nemotron-Pretraining-SFT-v1
Viewer
•
Updated
•
299M
•
2.81k
•
62
Viewer
•
Updated
•
1.86M
•
17.7k
•
225
EssentialAI/essential-web-v1.0
Preview
•
Updated
•
112k
•
218
EssentialAI/eai-taxonomy-stem-w-dclm
Preview
•
Updated
•
360
•
6
EssentialAI/eai-taxonomy-med-w-dclm
Viewer
•
Updated
•
81.2M
•
276
•
8
EssentialAI/eai-taxonomy-code-w-dclm
Viewer
•
Updated
•
274M
•
85.2k
•
9
EssentialAI/eai-taxonomy-math-w-fm
Viewer
•
Updated
•
21.6M
•
215
•
5
Viewer
•
Updated
•
27.9B
•
26
•
3
DataMuncher-Labs/UltiMath
Viewer
•
Updated
•
32.9B
•
17.8k
•
42
HuggingFaceFW/finetranslations
Viewer
•
Updated
•
3.33B
•
45.6k
•
270
Viewer
•
Updated
•
69.9k
•
60k
•
354