Instructions to use Qwen/Qwen3-Embedding-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Qwen/Qwen3-Embedding-8B with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Qwen/Qwen3-Embedding-8B") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use Qwen/Qwen3-Embedding-8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Qwen/Qwen3-Embedding-8B")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Qwen/Qwen3-Embedding-8B", dtype="auto") - Inference
- Notebooks
- Google Colab
- Kaggle
Is `tie_word_embeddings: false` correct for this model?
The 0.6B and 4B models have tie_word_embeddings set to true in config.json, but for this model it is set to false. Is this correct?
"tie_word_embeddings" has no impact on embedding models; it only affects generative models.
For example, if you look at Qwen3-Reranker-8B, "tie_word_embeddings" is set to False, so its model parameters include the final lm_head. In contrast, "tie_word_embeddings" is set to True in Qwen3-Reranker-0.6B and Qwen3-Reranker-4B, so their model parameter files do not contain lm_head. When loading these models with AutoModelForCausalLM, since 0.6B and 4B lack lm_head, the model will automatically use the parameters of the initial embedding layer (embed_tokens.weight) as lm_head. This is the function of "tie_word_embeddings" being True: to share model parameters between the embedding layer and lm_head.
The official has not explained why only the 8B model does not share parameters while the 0.6B and 4B models do. It can only be speculated that for models with larger parameter counts, using an independent lm_head can achieve better results.