Running Featured 90 Distilling 100B+ Models 40x Faster with TRL 📝 90 TRL distillation for 100B+ teachers, 40x faster
Running on CPU Upgrade Featured 3.22k The Smol Training Playbook 📚 3.22k The secrets to building world-class LLMs
view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.15k
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models Paper • 2312.17661 • Published Dec 29, 2023 • 15