Fast, lossless LLM inference via dual-view diffusion decoding.
-
chiennv/Orthrus-Qwen3-4B
Text Generation • 5B • Updated • 742 • 9 -
chiennv/Orthrus-Qwen3-8B
Text Generation • 10B • Updated • 2.75k • 20 -
chiennv/Orthrus-Qwen3-1.7B
Text Generation • 2B • Updated • 1.48k • 8 -
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion
Paper • 2605.12825 • Published • 12