Instructions to use staka/fugumt-en-ja with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use staka/fugumt-en-ja with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="staka/fugumt-en-ja")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("staka/fugumt-en-ja") model = AutoModelForSeq2SeqLM.from_pretrained("staka/fugumt-en-ja") - Inference
- Notebooks
- Google Colab
- Kaggle
| license: cc-by-sa-4.0 | |
| language: | |
| - en | |
| - ja | |
| tags: | |
| - translation | |
| # FuguMT | |
| This is a translation model using Marian-NMT. | |
| For more details, please see [my repository](https://github.com/s-taka/fugumt). | |
| * source language: en | |
| * target language: ja | |
| ### How to use | |
| This model uses transformers and sentencepiece. | |
| ```python | |
| !pip install transformers sentencepiece | |
| ``` | |
| You can use this model directly with a pipeline: | |
| ```python | |
| from transformers import pipeline | |
| fugu_translator = pipeline('translation', model='staka/fugumt-en-ja') | |
| fugu_translator('This is a cat.') | |
| ``` | |
| If you want to translate multiple sentences, we recommend using [pySBD](https://github.com/nipunsadvilkar/pySBD). | |
| ```python | |
| !pip install transformers sentencepiece pysbd | |
| import pysbd | |
| seg_en = pysbd.Segmenter(language="en", clean=False) | |
| from transformers import pipeline | |
| fugu_translator = pipeline('translation', model='staka/fugumt-en-ja') | |
| txt = 'This is a cat. It is very cute.' | |
| print(fugu_translator(seg_en.segment(txt))) | |
| ``` | |
| ### Eval results | |
| The results of the evaluation using [tatoeba](https://tatoeba.org/ja)(randomly selected 500 sentences) are as follows: | |
| |source |target |BLEU(*1)| | |
| |-------|-------|--------| | |
| |en |ja |32.7 | | |
| (*1) sacrebleu --tokenize ja-mecab |