Turkish Embedding Model — Query Encoder (Best Epoch)

Fine-tuned from selmanbaysan/turkish_embedding_model_fine_tuned using a dual-encoder (bi-encoder) architecture with in-batch contrastive loss on Turkish QA pairs.

Checkpoint selection

  • Selected epoch: 20
  • Selection metric: validation c-index (rnk) = 0.8881
  • These weights correspond to the epoch with the highest validation c-index (rnk) observed during training (not the final epoch).

Encoders

Repo Role
sfidan42/turkish_embedding_fine_tuned_q_enc_best Query encoder (best epoch)
sfidan42/turkish_embedding_fine_tuned_a_enc_best Answer / passage encoder (best epoch)

Training details

  • Loss: In-batch cross-entropy (InfoNCE), temperature=0.1
  • Optimizer: AdamW, lr=5e-6, gradient clipping=1.0
  • Train sets: sfidan42/Quora_QA_Turkish_First_Answers, sfidan42/Quora_QA_Turkish_Second_Answers
  • Eval set: sfidan42/Turkish-QA-Ranked

Usage

from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained("sfidan42/turkish_embedding_fine_tuned_q_enc_best")
model     = AutoModel.from_pretrained("sfidan42/turkish_embedding_fine_tuned_q_enc_best")

def embed(texts):
    enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
    with torch.no_grad():
        o = model(**enc).last_hidden_state
        m = enc["attention_mask"].unsqueeze(-1)
        return (o * m).sum(1) / m.sum(1)  # mean pooling

query_vec   = embed(["Türkiye'nin başkenti neresidir?"])
passage_vec = embed(["Türkiye'nin başkenti Ankara'dır."])
score = torch.nn.functional.cosine_similarity(query_vec, passage_vec)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sfidan42/turkish_embedding_fine_tuned_q_enc_best

Finetuned
(4)
this model