Qwen3-4B-A2D-untrained-dllm-convert

This repository contains the Qwen3-4B model converted to the A2D architecture (bidirectional attention), as presented in the paper Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation.

This specific artifact serves as an untrained student initialization for the On-Policy Distillation (OPD) process to transform an autoregressive model into a diffusion language model.

Model Details

  • Architecture: A2D-Qwen3 (non-causal attention, same weights as original)
  • Parameters: 4.02B
  • Vocab size: 151936
  • Model type: a2d-qwen3

This model has the original Qwen3-4B weights with bidirectional (non-causal) attention. It was converted using the dllm convert pipeline. No diffusion pretraining or SFT has been applied.

Mask token registration: The mask token <|MASK|> (ID 151669) is registered in the tokenizer for use with diffusion-based language modeling. The original Qwen3 tokenizer includes <|MASK|> in special_tokens_map.json but does not register it in tokenizer_config.json, so tokenizer.mask_token_id returns None. We fixed this by adding <|MASK|> to the added_tokens_decoder section and the mask_token field in tokenizer_config.json, and adding the full mask_token entry in special_tokens_map.json. After this fix, tokenizer.mask_token_id correctly returns 151669.

Citation

@misc{su2026opdlm,
      title={Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation},
      author={Xingyu Su and Jacob Helwig and Shubham Parashar and Atharv Chagi and Lakshmi Jotsna and Degui Zhi and James Caverlee and Dileep Kalathil and Shuiwang Ji},
      year={2026},
      eprint={2606.06712},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.06712},
}
Downloads last month
4
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for divelab/Qwen3-4B-a2d-init

Finetuned
Qwen/Qwen3-4B
Finetuned
(703)
this model

Collection including divelab/Qwen3-4B-a2d-init

Paper for divelab/Qwen3-4B-a2d-init