LightOnOCR-2-1B-Poneglyph

LightOnOCR-2-1B-Poneglyph is a specialized end-to-end vision-language model fine-tuned for high-precision OCR, specifically targeting the font used in French editions of the One Piece manga.

Performance & Precision

By leveraging a focused dataset (consisting of bubble crops with a specific font corpus), the model achieves a 0% CER (Character Error Rate) and 0% WER (Word Error Rate) on evaluation data.

The training set comprises nearly 5,000 high-quality crops of dialogue bubbles, specifically curated to minimize background noise and prioritize clear text extraction.

Project Context

This model was developed for Projet Poneglyph. While the training data is highly specific, this narrow focus is intentional: it allows the model to achieve near-perfect accuracy for this unique use case.

Note on Generalization: This model has not been benchmarked on general datasets. Given its hyper-specialization, it may have reduced performance on standard document formatting. It is strictly intended for OCR tasks involving manga text bubbles.

Acknowledgments

Special thanks to the LightOnAI team for releasing the LightOnOCR-2 family. This provides a robust foundation for specialized fine-tuning, particularly due to its native proficiency with the French language.

@misc{lightonocr2_2026,
  title        = {LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR},
  author       = {Said Taghadouini and Adrien Cavaill\`{e}s and Baptiste Aubertin},
  year         = {2026},
  howpublished = {\url{[https://arxiv.org/abs/2601.14251](https://arxiv.org/abs/2601.14251)}}
}
Downloads last month
47
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Remidesbois/LightonOCR-2-1b-poneglyph

Finetuned
(17)
this model

Paper for Remidesbois/LightonOCR-2-1b-poneglyph