🌐 mt5-ossetian-translator

A fine-tuned mT5 model optimized for Ossetian machine translation. The model was trained for 10 epochs using the Hugging Face Trainer framework, demonstrating stable convergence with both training and validation loss plateauing around epoch 8.

πŸ“Š Training Metrics

Starting training...
 [1130/1130 27:40, Epoch 10/10]
Epoch	Training Loss	Validation Loss
1	4.760233	3.660645
2	3.665481	3.144127
3	3.284355	2.871792
4	2.910443	2.698959
5	2.716799	2.608596
6	2.567515	2.554176
7	2.334619	2.507996
8	2.326849	2.465583
9	2.173708	2.468309
10	2.190471	2.469198
Writing model shards: 100%
 1/1 [00:25<00:00, 25.28s/it]

Convergence Notes:

  • Training loss steadily decreased from 4.76 β†’ 2.19
  • Validation loss dropped significantly in the first 6 epochs and stabilized around 2.46–2.47, indicating effective learning without severe overfitting.
  • Total training steps: 11,300 (1,130 steps/epoch Γ— 10 epochs)
  • Average epoch time: ~27 minutes

βš™οΈ Training Configuration

  • Base Model: google/mt5-[base/small/large] (update to your exact variant)
  • Task: Machine Translation ([Source Language] ↔ Ossetian)
  • Framework: Hugging Face transformers + Trainer
  • Hardware: [e.g., 1Γ— NVIDIA A100 40GB / Colab Pro / etc.]
  • Output Format: Sharded checkpoints pushed to Hugging Face Hub
  • Dataset: [Link or name of your translation dataset]

πŸš€ Usage

πŸ”Ή Direct Inference

import os
import json
import torch
from pathlib import Path
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, T5Tokenizer

MODEL_ID = "ajsbsd/mt5-ossetian-translator"

print("⬇️ Downloading model (with auth)...")
local_path = snapshot_download(
    repo_id=MODEL_ID,
    resume_download=True,
    ignore_patterns=["optimizer.pt", "*.pt"]  # Skip training artifacts
)

print(f"πŸ“ Model downloaded to: {local_path}")

# πŸ” Find the SentencePiece model file
spm_candidates = ["spiece.model", "sentencepiece.bpe.model", "tokenizer.model"]
spm_file = None
for candidate in spm_candidates:
    path = os.path.join(local_path, candidate)
    if os.path.exists(path):
        spm_file = path
        print(f"βœ… Found SentencePiece model: {spm_file}")
        break

if not spm_file:
    print("⚠️ No SentencePiece model found β€” falling back to base mT5 tokenizer")
    tokenizer = AutoTokenizer.from_pretrained("google/mt5-small", use_fast=False)
else:
    # βœ… Load tokenizer with EXPLICIT string path (critical!)
    print("πŸ”§ Loading tokenizer with explicit spm path...")
    tokenizer = T5Tokenizer(
        vocab_file=str(spm_file),  # ← Must be string, not Path object
        eos_token="</s>",
        unk_token="<unk>",
        pad_token="<pad>",
        extra_ids=100,  # mT5 uses 100 sentinel tokens
        legacy=True
    )

# Patch config to avoid future issues
config_file = os.path.join(local_path, "tokenizer_config.json")
if os.path.exists(config_file):
    with open(config_file, "r", encoding="utf-8") as f:
        config = json.load(f)
    # Fix known issues
    if "extra_special_tokens" in config and isinstance(config["extra_special_tokens"], list):
        config["extra_special_tokens"] = {}
    if "vocab_file" in config and config["vocab_file"] is None:
        config["vocab_file"] = str(spm_file) if spm_file else "spiece.model"
    if "spm_model_file" in config and config["spm_model_file"] is None:
        config["spm_model_file"] = str(spm_file) if spm_file else "spiece.model"
    with open(config_file, "w", encoding="utf-8") as f:
        json.dump(config, f, indent=2, ensure_ascii=False)
    print("πŸ”§ Patched tokenizer_config.json")

# Load model
print("πŸ“¦ Loading model weights...")
model = AutoModelForSeq2SeqLM.from_pretrained(local_path)
if torch.cuda.is_available():
    model = model.to("cuda")
    print("βœ… Model moved to CUDA")

# πŸ§ͺ Run translation test
prompt = "translate english to ossetian: Hello, how are you?"
print(f"\nπŸ”„ Translating: '{prompt}'")

inputs = tokenizer(prompt, return_tensors="pt")
if torch.cuda.is_available():
    inputs = inputs.to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    num_beams=4,
    early_stopping=True,
    no_repeat_ngram_size=2
)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"βœ… Output: {result}")

Using pipeline

from transformers import pipeline

translator = pipeline("translation", model="ajsbsd/mt5-ossetian-translator", tokenizer=model_id)
result = translator("translate [source_lang] to ossetian: Your text here.")
print(result[0]["translation_text"])
@misc{mt5_ossetian_translator,
  title={mt5-ossetian-translator},
  author={ajsbsd},
  year={2026},
  url={https://huggingface.co/ajsbsd/mt5-ossetian-translator}
}
Downloads last month
68
Safetensors
Model size
0.6B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ajsbsd/mt5-ossetian-translator

Base model

google/mt5-small
Finetuned
(683)
this model

Dataset used to train ajsbsd/mt5-ossetian-translator

Space using ajsbsd/mt5-ossetian-translator 1