JA-Trans-Init

Japanese scientific T5 model initialized from EN-T5-Sci using WECHSEL and a language-specific SentencePiece 32k tokenizer.

Model Details

This is one of the non-English scientific T5 transfer models from the paper. The model keeps the EN-T5-Sci Transformer weights and reinitializes the language-specific embeddings with WECHSEL using a target SentencePiece tokenizer.

Paper name: JA-Trans-Init
Model role: main
Source/base model: EN-T5-Sci
Code and pipeline: GitHub repository
Architecture: T5 encoder-decoder
SciLaD dataset: scilons/SciLaD-all-text-v1
Evaluation benchmark: Global-MMLU
Target-language tokenizer: Japanese SciLaD split; language-specific SentencePiece 32k tokenizer

Evaluated against:

JA-Base-CP control: reported as the continued-pretraining control.
upstream target-language base: reported as the monolingual base comparison.

WECHSEL resources: English fastText embeddings + Japanese fastText embeddings (ja) with the japanese bilingual dictionary.

Evaluation

Zero-shot Global-MMLU accuracy reported by the paper aggregation:

Metric	Accuracy
Average	25.51
STEM	26.26
Humanities	27.12
Social Sciences	23.79
Other	24.01

Limitations

The model is evaluated primarily with zero-shot Global-MMLU. Downstream task-specific evaluation is recommended before deployment in specialized scientific workflows.

Citation

Title: Transferring Scientific English Pre-Trained Language Models to Multiple Languages Using Cross-Lingual Transfer
Authors: Nikolas Rauscher, Fabio Barth, Georg Rehm
Venue: LREC-COLING 2026, citation details TBA after publication

Downloads last month: 12

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rausch/ja-t5-sci-transfer-init-spm32k

Base model

rausch/en-t5-sci-continued-pretraining-487k

Finetuned

(6)

this model

Dataset used to train rausch/ja-t5-sci-transfer-init-spm32k

Collection including rausch/ja-t5-sci-transfer-init-spm32k

scientific-multilingual-transfer

Collection

This Collection contains the Models from the paper [TBA Link] • 13 items • Updated 2 days ago