Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 15 items • Updated 6 days ago • 164
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Paper • 2505.02707 • Published May 5, 2025 • 85
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10, 2025 • 153
The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models Paper • 2501.09653 • Published Jan 16, 2025 • 12
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published Jan 23, 2025 • 41