AI & ML interests

None defined yet.

Recent Activity

muhammad0-0hreden  updated a Space 1 day ago
Misraj/README
muhammad0-0hreden  published a Space 1 day ago
Misraj/README
Hennara  updated a model 3 days ago
Misraj/Baseer__Nakba
View all activity

Organization Card

مِسراج — Misraj AI

Built on Trust. Measured by Impact.
The next-generation Arabic AI lab — building the foundational infrastructure for Arabic language understanding, generation, and document intelligence.


🧭 About Us

Misraj AI is the AI research division of Misraj Technology, a Saudi-based technology group with over 10 years of experience delivering enterprise digital solutions across 15 sectors. Our AI lab is dedicated to a singular mission: making Arabic a first-class language in the modern AI era.

We develop open models, large-scale datasets, rigorous benchmarks, and production-ready AI systems — all purpose-built for Arabic, a morphologically rich language that has long been underserved by mainstream AI research.

From our research lab to operational products, we build a comprehensive system that enables governments and enterprises to adopt AI with confidence, depth, and speed.

📊 15+ research papers · 35 billion open Arabic data tokens · Honored by AI Pioneers


🏢 Areas of Expertise

Our AI solutions span critical industry verticals, combining deep domain knowledge with state-of-the-art Arabic NLP:

  • 🏥 Healthcare Technology — Clinical documentation and Arabic medical NLP
  • 🏦 Financial Technology — Document intelligence for banking and finance
  • ⚖️ Legal Technology — Contract analysis and legal document processing
  • 🎓 Educational Technology — Arabic learning and knowledge systems
  • 🏛️ Administrative Technology — Government and enterprise document automation

📦 Open Datasets

We are committed to releasing high-quality, openly available Arabic AI resources to empower the global research community.

Dataset Description Scale
Misraj-DocOCR Expert-verified Arabic document OCR benchmark 400 images
KITAB PDF-to-Markdown Corrected Arabic PDF-to-Markdown corpus 62 documents
msdd Misraj Structured Document Dataset 26.4M rows
mudd Misraj Unstructured Document Dataset 4.76M rows
Tarjama-25 Bidirectional Arabic-English translation benchmark 5,000 expert-reviewed sentence pairs
Arabic-Image-Captioning 100M First large-scale Arabic multimodal captioning dataset 100M caption pairs
SadeedDiac-25 Arabic diacritization benchmark 1.2K samples
Sadeed Tashkeela Large-scale Arabic diacritization corpus 1.05M samples

35+ billion open Arabic data tokens released and growing.


📬 Connect With Us

Platform Link
🌐 Misraj AI misraj.ai/en
🌐 Misraj Technology misraj.sa/en
🔵 Baseer OCR baseerocr.com
🤗 Hugging Face huggingface.co/Misraj
💼 LinkedIn linkedin.com/company/aimisraj
🐦 X / Twitter @aimisraj
💻 GitHub github.com/misraj-ai
📸 Instagram @misraj__ai