Instructions to use sfidan42/turkish_embedding_fine_tuned_q_enc_best with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use sfidan42/turkish_embedding_fine_tuned_q_enc_best with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("sfidan42/turkish_embedding_fine_tuned_q_enc_best") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Turkish Embedding Model — Query Encoder (Best Epoch)
Fine-tuned from selmanbaysan/turkish_embedding_model_fine_tuned using a dual-encoder (bi-encoder) architecture with in-batch contrastive loss on Turkish QA pairs.
Checkpoint selection
- Selected epoch: 20
- Selection metric: validation
c-index (rnk)= 0.8881 - These weights correspond to the epoch with the highest validation
c-index (rnk)observed during training (not the final epoch).
Encoders
| Repo | Role |
|---|---|
sfidan42/turkish_embedding_fine_tuned_q_enc_best |
Query encoder (best epoch) |
sfidan42/turkish_embedding_fine_tuned_a_enc_best |
Answer / passage encoder (best epoch) |
Training details
- Loss: In-batch cross-entropy (InfoNCE), temperature=0.1
- Optimizer: AdamW, lr=5e-6, gradient clipping=1.0
- Train sets:
sfidan42/Quora_QA_Turkish_First_Answers,sfidan42/Quora_QA_Turkish_Second_Answers - Eval set:
sfidan42/Turkish-QA-Ranked
Usage
from transformers import AutoTokenizer, AutoModel
import torch
tokenizer = AutoTokenizer.from_pretrained("sfidan42/turkish_embedding_fine_tuned_q_enc_best")
model = AutoModel.from_pretrained("sfidan42/turkish_embedding_fine_tuned_q_enc_best")
def embed(texts):
enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
o = model(**enc).last_hidden_state
m = enc["attention_mask"].unsqueeze(-1)
return (o * m).sum(1) / m.sum(1) # mean pooling
query_vec = embed(["Türkiye'nin başkenti neresidir?"])
passage_vec = embed(["Türkiye'nin başkenti Ankara'dır."])
score = torch.nn.functional.cosine_similarity(query_vec, passage_vec)