Is `tie_word_embeddings: false` correct for this model?

#17

by depasquale - opened Jun 26, 2025

Jun 26, 2025

The 0.6B and 4B models have tie_word_embeddings set to true in config.json, but for this model it is set to false. Is this correct?

Jay-v2

Jul 30, 2025

"tie_word_embeddings" has no impact on embedding models; it only affects generative models.

For example, if you look at Qwen3-Reranker-8B, "tie_word_embeddings" is set to False, so its model parameters include the final lm_head. In contrast, "tie_word_embeddings" is set to True in Qwen3-Reranker-0.6B and Qwen3-Reranker-4B, so their model parameter files do not contain lm_head. When loading these models with AutoModelForCausalLM, since 0.6B and 4B lack lm_head, the model will automatically use the parameters of the initial embedding layer (embed_tokens.weight) as lm_head. This is the function of "tie_word_embeddings" being True: to share model parameters between the embedding layer and lm_head.

The official has not explained why only the 8B model does not share parameters while the 0.6B and 4B models do. It can only be speculated that for models with larger parameter counts, using an independent lm_head can achieve better results.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment