Resumator v2

Fine-tuned sentence-transformer for resume-to-job matching. Encodes resumes and job descriptions into embeddings for cosine similarity search.

Trained with Matryoshka Representation Learning — supports truncation to 384/256/128 dims at inference with minimal quality loss.

Property Value
Base Model all-mpnet-base-v2 (109M params)
Native Dimensions 768
Recommended Dimensions 384 (Matryoshka truncation)
Max Sequence Length 512 tokens
Pooling Mean pooling + L2 normalization
Parameters ~109.5M
Size ~418 MB
Training MatryoshkaLoss + MultipleNegativesRankingLoss on 15,600 LLM-scored resume-job pairs

What's New in v2

Improvement v1 v2
Base model all-MiniLM-L6-v2 (22.7M) all-mpnet-base-v2 (109.5M)
Training loss CosineSimilarityLoss MatryoshkaLoss + MNRL + CosineSimilarityLoss
Training data 624 pairs 15,600 LLM-scored pairs
Spearman (held-out) 0.436 0.796
Ranking accuracy 79.9% 98.0%

Usage

from sentence_transformers import SentenceTransformer

# Load with Matryoshka truncation to 384 dims
model = SentenceTransformer("shankerram3/resumator", truncate_dim=384)

candidate = "Name: Jane Doe\nSkills: Python, React, PostgreSQL\nResume: Full-stack engineer with 4 years..."
job = "Title: Senior Software Engineer\nCompany: TechCorp\nDescription: Looking for full-stack engineer..."

embeddings = model.encode([candidate, job], normalize_embeddings=True)
similarity = float(embeddings[0] @ embeddings[1])  # cosine similarity

Input Format

Candidate:

Name: {name}
Titles: {title1}, {title2}
Skills: {skill1}, {skill2}, {skill3}
Experience: {years} years
Location: {city}, {state}
Resume: {full_resume_text}

Job:

Title: {job_title}
Company: {company_name}
Location: {city}, {state}
Experience: {years} years
Description: {full_job_description}

Benchmarks

Held-Out Evaluation (Unseen Data)

Evaluated on 500 resume-job pairs using jobs the model never saw during training, scored by Cerebras gpt-oss-120b as ground truth. Run on AMD Instinct MI300X GPU.

Model Dims Params Spearman Ranking Acc Separation IQR
resumator-v2 (384) 384 109.5M 0.796 98.0% 0.217 0.185
all-mpnet-base-v2 768 109.5M 0.511 78.7% 0.124 0.160
wynisco-matcher-v1 384 22.7M 0.436 79.9% 0.103 0.123
all-MiniLM-L6-v2 384 22.7M 0.387 76.7% 0.110 0.140
bge-base-en-v1.5 768 109.5M 0.238 61.6% 0.038 0.088
e5-base-v2 768 109.5M 0.188 55.0% 0.011 0.058
  • Spearman: Rank correlation with LLM recruiter scores (higher = better ranking)
  • Ranking Acc: % of (good, bad) pairs where the good match scores higher
  • Separation: Mean cosine sim gap between good matches (≥0.6) and bad matches (≤0.3)
  • IQR: Interquartile range — higher means better discrimination between matches

MTEB Benchmarks

Standard sentence-transformer evaluation on public benchmarks (not domain-specific):

Task v1 v2 (384) Delta
STS12 0.724 0.716 -0.008
STS13 0.808 0.833 +0.025
STS14 0.755 0.776 +0.021
STS15 0.851 0.852 +0.001
STS16 0.785 0.793 +0.007
STSBenchmark 0.817 0.828 +0.012
SICK-R 0.775 0.804 +0.028
BIOSSES 0.798 0.812 +0.014
SprintDuplicateQuestions 0.941 0.907 -0.034
TwitterSemEval2015 0.675 0.730 +0.056
AskUbuntuDupQuestions 0.633 0.654 +0.021
SciDocsRR 0.872 0.873 +0.001
StackOverflowDupQuestions 0.507 0.513 +0.007
Average 0.765 0.777 +0.012

v2 improves on 11 of 13 MTEB tasks despite being fine-tuned for a specific domain.

Speed

MI300X GPU:

Model Single Batch of 50
resumator-v2 (384) 6.15 ms 13.3 ms (0.27 ms/item)
all-mpnet-base-v2 6.09 ms 13.0 ms (0.26 ms/item)
wynisco-matcher-v1 3.48 ms 7.6 ms (0.15 ms/item)

Apple M-series CPU:

Model Single Batch of 50
resumator-v2 (384) 9.6 ms 72 ms (1.4 ms/item)
all-mpnet-base-v2 8.2 ms 61 ms (1.2 ms/item)
wynisco-matcher-v1 5.7 ms 15 ms (0.3 ms/item)

Score Distribution (2,500 resume-job pairs)

Metric v1 v2 (384)
Mean 0.546 0.376
Std Dev 0.098 0.124
IQR 0.133 0.186
Min / Max 0.211 / 0.859 0.025 / 0.714
v2 distribution:
0.0-0.1:                                             13 (  0.5%)
0.1-0.2: ########                                   149 (  6.0%)
0.2-0.3: ################################           590 ( 23.6%)
0.3-0.4: ########################################   722 ( 28.9%)
0.4-0.5: ##############################             543 ( 21.7%)
0.5-0.6: ######################                     410 ( 16.4%)
0.6-0.7: ###                                         72 (  2.9%)
0.7-0.8:                                              1 (  0.0%)

v2 spreads scores more widely (IQR 0.186 vs 0.133), making it easier to set thresholds and distinguish good matches from mediocre ones.

Matryoshka Dimension Quality

Trained with Matryoshka Representation Learning — quality at different truncation levels:

Dimensions Good Match Score Bad Match Score Delta
768 (full) 0.691 0.228 0.462
384 (recommended) 0.696 0.186 0.510

384-dim truncation actually improves separation — Matryoshka loss concentrates the most discriminative information in the first 384 dimensions.

Training Details

Data

  • 15,600 resume-job pairs scored 0.0–1.0 by Cerebras gpt-oss-120b
  • 51 candidates × 300 random jobs each
  • Scoring criteria: technical skills overlap (50%), role/title alignment (30%), experience level match (20%)

Training Pipeline

  1. Phase 1: MatryoshkaLoss wrapping MultipleNegativesRankingLoss (5 epochs)
    • Triplets: anchor (candidate) + positive (score ≥ 0.6) + hard negative (score ≤ 0.3)
    • 2,702 triplets, effective batch size 32 (8 × 4 gradient accumulation)
    • Matryoshka dims: [768, 384, 256, 128]
  2. Phase 2: CosineSimilarityLoss refinement (2 epochs)
    • Mid-range pairs (0.3 < score < 0.6) for calibrating the middle of the score range
    • 2,503 pairs, learning rate 5e-6

Training Logs

Phase 1 — MatryoshkaLoss + MNRL (5 epochs):

Epoch Step Loss
0.12 10 2.942
1.06 90 1.856
2.00 170 1.792
3.06 260 1.757
4.00 340 1.684
4.95 420 1.634

Phase 2 — CosineSimilarityLoss (2 epochs):

Epoch Step Loss
0.13 10 0.044
0.51 40 0.009
1.01 80 0.007
1.91 150 0.006

Hardware

  • Training: AMD Instinct MI300X (ROCm 6.2), ~3 minutes total
  • Evaluation: AMD Instinct MI300X + Apple M-series CPU

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 5.2.3
  • Transformers: 5.3.0
  • PyTorch: 2.5.1+rocm6.2
  • Accelerate: 1.13.0
  • Datasets: 4.7.0

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_mean_tokens': True})
  (2): Normalize()
)

Why Fine-Tune?

Generic sentence-transformers treat resumes as arbitrary text. Fine-tuning teaches:

  • Domain vocabulary: "OPT", "H1B", "C2C" are visa types, not random acronyms
  • Structural alignment: Match skills sections to requirements sections
  • Experience calibration: "3 years Java" → "mid-level Java developer", not "senior architect"
  • Description reading: v2 matches on actual job description content, not just title keywords

General-purpose models (bge, e5) score 0.19–0.24 Spearman on resume-job matching despite leading MTEB leaderboards. Domain fine-tuning is essential for this task.

Limitations

  • English only, US tech market bias
  • 512 token limit — key information should appear early in the text
  • Trained on tech/IT recruiting data — may underperform on non-tech roles
  • LLM-scored training data inherits any biases from the scoring model

Citation

BibTeX

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

License

Apache 2.0

Downloads last month
101
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shankerram3/resumator

Finetuned
(358)
this model

Paper for shankerram3/resumator