alakxender/dhivehi-image-text
Viewer • Updated • 394k • 749
A fine-tuned TrOCR model for Dhivehi (Maldivian) text recognition using Thaana script.
from transformers import TrOCRProcessor, VisionEncoderDecoderModel, PreTrainedTokenizerFast
from PIL import Image
import torch
processor = TrOCRProcessor.from_pretrained("Serialtechlab/dhivehi-trocr-base-handwritten")
model = VisionEncoderDecoderModel.from_pretrained("Serialtechlab/dhivehi-trocr-base-handwritten")
tokenizer = PreTrainedTokenizerFast.from_pretrained("Serialtechlab/dhivehi-trocr-base-handwritten")
image = Image.open("dhivehi_text.png").convert("RGB")
pixel_values = processor(image, return_tensors='pt').pixel_values
with torch.no_grad():
generated_ids = model.generate(pixel_values, max_length=128, num_beams=4)
tokens = tokenizer.convert_ids_to_tokens(generated_ids[0])
special = [tokenizer.pad_token, tokenizer.bos_token, tokenizer.eos_token, tokenizer.unk_token]
text = "".join([t for t in tokens if t not in special])
print(text)
Trained from scratch on Google Colab (A100) for 6 epochs with:
Base model
microsoft/trocr-base-handwritten