OriOn Collection Visual long document VLMs based on Mistral-Small-3.1-24B-Instruct-2503 and Qwen3-VL-32B-Instruct • 4 items • Updated 2 days ago • 4
LateOn-Code 💻 Collection State-of-the-art late interaction code retrieval models • 6 items • Updated 2 days ago • 14
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR Paper • 2601.14251 • Published Jan 20 • 25
LightOnOCR-2 🦉 Collection LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family • 12 items • Updated 2 days ago • 22
view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family Jan 19 • 87
view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR Oct 23, 2025 • 73
view article Article Introducing EuroBERT: A High-Performance Multilingual Encoder Model Mar 10, 2025 • 146
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 159
view article Article ArabicWeb24: Creating a High Quality Arabic Web-only Pre-training Dataset Aug 8, 2024 • 11