🌐 mt5-ossetian-translator

A fine-tuned mT5 model optimized for Ossetian machine translation. The model was trained for 10 epochs using the Hugging Face Trainer framework, demonstrating stable convergence with both training and validation loss plateauing around epoch 8.

📊 Training Metrics

Starting training...
 [1130/1130 27:40, Epoch 10/10]
Epoch	Training Loss	Validation Loss
1	4.760233	3.660645
2	3.665481	3.144127
3	3.284355	2.871792
4	2.910443	2.698959
5	2.716799	2.608596
6	2.567515	2.554176
7	2.334619	2.507996
8	2.326849	2.465583
9	2.173708	2.468309
10	2.190471	2.469198
Writing model shards: 100%
 1/1 [00:25<00:00, 25.28s/it]

Convergence Notes:

Training loss steadily decreased from 4.76 → 2.19
Validation loss dropped significantly in the first 6 epochs and stabilized around 2.46–2.47, indicating effective learning without severe overfitting.
Total training steps: 11,300 (1,130 steps/epoch × 10 epochs)
Average epoch time: ~27 minutes

⚙️ Training Configuration

Base Model: google/mt5-[base/small/large] (update to your exact variant)
Task: Machine Translation ([Source Language] ↔ Ossetian)
Framework: Hugging Face transformers + Trainer
Hardware: [e.g., 1× NVIDIA A100 40GB / Colab Pro / etc.]
Output Format: Sharded checkpoints pushed to Hugging Face Hub
Dataset: [Link or name of your translation dataset]

🚀 Usage

🔹 Direct Inference

import os
import json
import torch
from pathlib import Path
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, T5Tokenizer

MODEL_ID = "ajsbsd/mt5-ossetian-translator"

print("⬇️ Downloading model (with auth)...")
local_path = snapshot_download(
    repo_id=MODEL_ID,
    resume_download=True,
    ignore_patterns=["optimizer.pt", "*.pt"]  # Skip training artifacts
)

print(f"📁 Model downloaded to: {local_path}")

# 🔍 Find the SentencePiece model file
spm_candidates = ["spiece.model", "sentencepiece.bpe.model", "tokenizer.model"]
spm_file = None
for candidate in spm_candidates:
    path = os.path.join(local_path, candidate)
    if os.path.exists(path):
        spm_file = path
        print(f"✅ Found SentencePiece model: {spm_file}")
        break

if not spm_file:
    print("⚠️ No SentencePiece model found — falling back to base mT5 tokenizer")
    tokenizer = AutoTokenizer.from_pretrained("google/mt5-small", use_fast=False)
else:
    # ✅ Load tokenizer with EXPLICIT string path (critical!)
    print("🔧 Loading tokenizer with explicit spm path...")
    tokenizer = T5Tokenizer(
        vocab_file=str(spm_file),  # ← Must be string, not Path object
        eos_token="</s>",
        unk_token="<unk>",
        pad_token="<pad>",
        extra_ids=100,  # mT5 uses 100 sentinel tokens
        legacy=True
    )

# Patch config to avoid future issues
config_file = os.path.join(local_path, "tokenizer_config.json")
if os.path.exists(config_file):
    with open(config_file, "r", encoding="utf-8") as f:
        config = json.load(f)
    # Fix known issues
    if "extra_special_tokens" in config and isinstance(config["extra_special_tokens"], list):
        config["extra_special_tokens"] = {}
    if "vocab_file" in config and config["vocab_file"] is None:
        config["vocab_file"] = str(spm_file) if spm_file else "spiece.model"
    if "spm_model_file" in config and config["spm_model_file"] is None:
        config["spm_model_file"] = str(spm_file) if spm_file else "spiece.model"
    with open(config_file, "w", encoding="utf-8") as f:
        json.dump(config, f, indent=2, ensure_ascii=False)
    print("🔧 Patched tokenizer_config.json")

# Load model
print("📦 Loading model weights...")
model = AutoModelForSeq2SeqLM.from_pretrained(local_path)
if torch.cuda.is_available():
    model = model.to("cuda")
    print("✅ Model moved to CUDA")

# 🧪 Run translation test
prompt = "translate english to ossetian: Hello, how are you?"
print(f"\n🔄 Translating: '{prompt}'")

inputs = tokenizer(prompt, return_tensors="pt")
if torch.cuda.is_available():
    inputs = inputs.to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    num_beams=4,
    early_stopping=True,
    no_repeat_ngram_size=2
)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"✅ Output: {result}")

Using pipeline

from transformers import pipeline

translator = pipeline("translation", model="ajsbsd/mt5-ossetian-translator", tokenizer=model_id)
result = translator("translate [source_lang] to ossetian: Your text here.")
print(result[0]["translation_text"])

@misc{mt5_ossetian_translator,
  title={mt5-ossetian-translator},
  author={ajsbsd},
  year={2026},
  url={https://huggingface.co/ajsbsd/mt5-ossetian-translator}
}

Downloads last month: 68

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ajsbsd/mt5-ossetian-translator

Base model

google/mt5-small

Finetuned

(683)

this model

ajsbsd
/

mt5-ossetian-translator