chart ahmed-masry/ChartQA Viewer • Updated Jun 22, 2024 • 32.7k • 1.69k • 31 oroikon/chart_captioning Viewer • Updated Oct 8, 2023 • 8.82k • 96 • 12 heegyu/chart2text_statista Viewer • Updated Oct 12, 2023 • 34.8k • 1.63k • 9 nourheshamshaheen/typed_final_chart_to_table Viewer • Updated Nov 12, 2023 • 2.81k • 52 • 6
Document Undestanding Models Mizukiluke/ureader-instruction-1.0 Viewer • Updated Oct 13, 2023 • 24.5k • 54 • 15
Captioning docling-project/USPTO-30K Viewer • Updated Aug 24, 2023 • 30k • 339 • 9 MMInstruction/ArxivCap Viewer • Updated Oct 3, 2024 • 573k • 55.7k • 58 mPLUG/M-Paper Preview • Updated Jan 13, 2024 • 491 • 13
DocQA jp1924/DocStruct4M Viewer • Updated Feb 5, 2025 • 3.05M • 48 • 4 howard-hou/OCR-VQA Viewer • Updated Apr 24, 2023 • 208k • 2.27k • 60 vikhyatk/docmatix-single Viewer • Updated Jul 19, 2024 • 565k • 45 • 6 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 268 • 38
Page to MD A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR v1v1d/Arxiv_MD_v2_2k Viewer • Updated Jun 24, 2024 • 3.04k • 15 v1v1d/Arxiv_MD_v2 Viewer • Updated Jun 24, 2024 • 14.2k • 96 v1v1d/Arxiv_MD_v1_1k Viewer • Updated Jun 23, 2024 • 1.14k • 20 v1v1d/Arxiv_MD_v1 Viewer • Updated Jun 18, 2024 • 9.96k • 63
OCR wendlerc/RenderedText Viewer • Updated Oct 23, 2025 • 12M • 13k • 58 Salesforce/blip3-ocr-200m Viewer • Updated Feb 3, 2025 • 96M • 1.09k • 44 openpecha/OCR-Google_Books Viewer • Updated Oct 20, 2025 • 751k • 309 openpecha/OCR-Norbuketaka Viewer • Updated Oct 14, 2025 • 2.24M • 27
Table Extraction docling-project/PubTables-1M_OTSL Viewer • Updated Aug 31, 2023 • 1.88M • 2.1k • 7 docling-project/PubTabNet_OTSL Viewer • Updated Aug 31, 2023 • 395k • 2.26k • 5 docling-project/FinTabNet_OTSL Viewer • Updated Aug 31, 2023 • 109k • 607 • 7
Layout Detection docling-project/DocLayNet-v1.1 Viewer • Updated Sep 1, 2023 • 63.5k • 1.82k • 27 docling-project/DocLayNet Updated Jan 25, 2023 • 629 • 139 vikp/doclaynet_processed Viewer • Updated Nov 30, 2023 • 80.9k • 640 • 6 psyche/publaynet Viewer • Updated Jul 30, 2024 • 347k • 119
VQA wyu1/Leopard-Instruct Viewer • Updated Nov 8, 2024 • 1.03M • 72.9k • 65 neulab/PangeaInstruct Updated Feb 2, 2025 • 393 • 86 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 268 • 38 vidore/arxivqa_train Viewer • Updated Jun 20, 2025 • 95k • 157
Latex Extract A dataset collection of image-text pairs, where each image contains mathematical formulas, and each corresponding text provides the relevant LaTeX v1v1d/Latexify_v1_clean Viewer • Updated Jul 29, 2024 • 11k • 38 • 1 v1v1d/Latexify_v1 Viewer • Updated Jul 29, 2024 • 234k • 19 • 1 OleehyO/latex-formulas Viewer • Updated Aug 13, 2025 • 1.56M • 461 • 101 unsloth/LaTeX_OCR Viewer • Updated Nov 21, 2024 • 76.3k • 4.26k • 82
chart ahmed-masry/ChartQA Viewer • Updated Jun 22, 2024 • 32.7k • 1.69k • 31 oroikon/chart_captioning Viewer • Updated Oct 8, 2023 • 8.82k • 96 • 12 heegyu/chart2text_statista Viewer • Updated Oct 12, 2023 • 34.8k • 1.63k • 9 nourheshamshaheen/typed_final_chart_to_table Viewer • Updated Nov 12, 2023 • 2.81k • 52 • 6
OCR wendlerc/RenderedText Viewer • Updated Oct 23, 2025 • 12M • 13k • 58 Salesforce/blip3-ocr-200m Viewer • Updated Feb 3, 2025 • 96M • 1.09k • 44 openpecha/OCR-Google_Books Viewer • Updated Oct 20, 2025 • 751k • 309 openpecha/OCR-Norbuketaka Viewer • Updated Oct 14, 2025 • 2.24M • 27
Document Undestanding Models Mizukiluke/ureader-instruction-1.0 Viewer • Updated Oct 13, 2023 • 24.5k • 54 • 15
Table Extraction docling-project/PubTables-1M_OTSL Viewer • Updated Aug 31, 2023 • 1.88M • 2.1k • 7 docling-project/PubTabNet_OTSL Viewer • Updated Aug 31, 2023 • 395k • 2.26k • 5 docling-project/FinTabNet_OTSL Viewer • Updated Aug 31, 2023 • 109k • 607 • 7
Captioning docling-project/USPTO-30K Viewer • Updated Aug 24, 2023 • 30k • 339 • 9 MMInstruction/ArxivCap Viewer • Updated Oct 3, 2024 • 573k • 55.7k • 58 mPLUG/M-Paper Preview • Updated Jan 13, 2024 • 491 • 13
Layout Detection docling-project/DocLayNet-v1.1 Viewer • Updated Sep 1, 2023 • 63.5k • 1.82k • 27 docling-project/DocLayNet Updated Jan 25, 2023 • 629 • 139 vikp/doclaynet_processed Viewer • Updated Nov 30, 2023 • 80.9k • 640 • 6 psyche/publaynet Viewer • Updated Jul 30, 2024 • 347k • 119
DocQA jp1924/DocStruct4M Viewer • Updated Feb 5, 2025 • 3.05M • 48 • 4 howard-hou/OCR-VQA Viewer • Updated Apr 24, 2023 • 208k • 2.27k • 60 vikhyatk/docmatix-single Viewer • Updated Jul 19, 2024 • 565k • 45 • 6 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 268 • 38
VQA wyu1/Leopard-Instruct Viewer • Updated Nov 8, 2024 • 1.03M • 72.9k • 65 neulab/PangeaInstruct Updated Feb 2, 2025 • 393 • 86 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 268 • 38 vidore/arxivqa_train Viewer • Updated Jun 20, 2025 • 95k • 157
Page to MD A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR v1v1d/Arxiv_MD_v2_2k Viewer • Updated Jun 24, 2024 • 3.04k • 15 v1v1d/Arxiv_MD_v2 Viewer • Updated Jun 24, 2024 • 14.2k • 96 v1v1d/Arxiv_MD_v1_1k Viewer • Updated Jun 23, 2024 • 1.14k • 20 v1v1d/Arxiv_MD_v1 Viewer • Updated Jun 18, 2024 • 9.96k • 63
Latex Extract A dataset collection of image-text pairs, where each image contains mathematical formulas, and each corresponding text provides the relevant LaTeX v1v1d/Latexify_v1_clean Viewer • Updated Jul 29, 2024 • 11k • 38 • 1 v1v1d/Latexify_v1 Viewer • Updated Jul 29, 2024 • 234k • 19 • 1 OleehyO/latex-formulas Viewer • Updated Aug 13, 2025 • 1.56M • 461 • 101 unsloth/LaTeX_OCR Viewer • Updated Nov 21, 2024 • 76.3k • 4.26k • 82