view article Article Scaling Pedagogical Pre-training: From Optimal Mixing to 10 Billion Tokens 25 days ago โข 4