view article Article Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding 6 days ago • 43
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 Text Generation • 32B • Updated 11 days ago • 1.39M • • 324
Llama Nemotron Collection Open, Production-ready Enterprise Models • 12 items • Updated 1 day ago • 77
nvidia/Llama-3_3-Nemotron-Super-49B-v1 Text Generation • 50B • Updated Oct 15, 2025 • 32.3k • 321
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published Nov 28, 2024 • 17