$R^2$-dLLM-LLaDA

$R^2$-dLLM is a unified framework for reducing decoding redundancy in Diffusion Large Language Models (dLLMs) from both inference and training perspectives. This specific checkpoint is a redundancy-aware supervised fine-tuned version of LLaDA-Instruct-8B.

Description

Diffusion Large Language Models (dLLMs) enable parallel token prediction but often suffer from high inference latency due to decoding redundancy. $R^2$-dLLM addresses this by:

  1. Inference-time rules: Aggregating local confidence and finalized predictions to avoid redundant decoding steps.
  2. Redundancy-aware SFT: Aligning the model with efficient decoding trajectories during training.

Experiments demonstrate that $R^2$-dLLM consistently reduces the number of decoding steps by up to 88% compared to existing decoding strategies, while maintaining competitive generation quality across different models and tasks.

Citation

@article{du2026r,
  title={$R^{2}$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction},
  author={Du, Zhenbang and Xia, Kejing and Zhong, Xinrui and Fu, Yonggan and Oswald, Nicolai and Ji, Binfei and Khailany, Brucek and Molchanov, Pavlo and Lin, Yingyan},
  journal={arXiv preprint arXiv:2604.18995},
  year={2026}
}
Downloads last month
29
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ZhenbangDu/R2-dLLM-LLaDA

Finetuned
(32)
this model
Quantizations
2 models

Collection including ZhenbangDu/R2-dLLM-LLaDA

Paper for ZhenbangDu/R2-dLLM-LLaDA