| --- |
| license: mit |
| language: |
| - en |
| metrics: |
| - accuracy |
| - bleu |
| pipeline_tag: table-question-answering |
| tags: |
| - code |
| --- |
| # TableDART Gating Network Checkpoint |
|
|
| This repository provides the trained gating network checkpoint for **TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding**. |
|
|
| TableDART is a training-efficient framework that dynamically routes each table-query pair through the most appropriate reasoning path: Text-only, Image-only, or Fusion, while keeping all pretrained expert models **frozen**. |
|
|
| --- |
|
|
| ## π Overview |
|
|
| Modeling semantic and structural information from tabular data remains a core challenge for effective table understanding. |
| Existing LLM-based approaches face several limitations: |
|
|
| - Table-as-Text methods flatten tables into text sequences, losing structural cues. |
| - Table-as-Image methods preserve layout but struggle with precise semantics. |
| - Static multimodal methods process all modalities for every query, introducing redundancy and potential cross-modal conflicts. |
| - Most approaches require expensive fine-tuning of large LLMs or multimodal models. |
|
|
| **Our Solution: TableDART** addresses these limitations through: |
|
|
| - Reusing pretrained single-modality expert models (kept frozen, plug-and-play) |
| - Learning only a lightweight 2.59M-parameter MLP gating network |
| - Dynamically selecting the optimal path for each table-query pair (instance-level) |
| - Introducing an LLM agent that mediates cross-modal knowledge integration when needed |
|
|
| This design avoids full LLM/MLLM fine-tuning, reduces computational redundancy, and maintains strong efficiency-performance trade-offs. |
|
|
| --- |
|
|
| ## π Performance |
|
|
| Across 7 benchmarks, TableDART: |
|
|
| - Achieves state-of-the-art results on 4/7 benchmarks among open-source models |
| - Outperforms the strongest baseline by +4.02% accuracy on average |
| - Maintains significant computational efficiency gains |
|
|
|
|
| ## π¦ What This Checkpoint Contains |
|
|
| This Hugging Face model includes: |
|
|
| - The trained MLP gating network checkpoint |
|
|
| β οΈ Note: This checkpoint does not include the pretrained text or image expert models. Please load those separately according to the official repository instructions. |
|
|
| --- |
|
|
| ## π Code and Usage |
|
|
| Full training scripts, inference pipelines, and reproduction details are available at our Github Repository: https://github.com/xiaobo-xing/TableDART |
|
|
| --- |
|
|
| ## π Paper |
|
|
| ICLR 2026 OpenReview Version: |
| https://openreview.net/forum?id=4aZTiLH3fm |
|
|
| ArXiv Version: |
| https://arxiv.org/abs/2509.14671 |
|
|
| --- |
|
|
| ## π Citation |
|
|
| If you find TableDART helpful, please cite our paper and consider starring the repository. |
|
|
| ### ICLR 2026 Version |
|
|
| ```bibtex |
| @inproceedings{xing2026tabledart, |
| title={Table{DART}: Dynamic Adaptive Multi-Modal Routing for Table Understanding}, |
| author={Xiaobo Xing and Wei Yuan and Tong Chen and Quoc Viet Hung Nguyen and Xiangliang Zhang and Hongzhi Yin}, |
| booktitle={The Fourteenth International Conference on Learning Representations}, |
| year={2026}, |
| url={https://openreview.net/forum?id=4aZTiLH3fm} |
| } |
| ``` |
|
|
| ### ArXiv Version |
| ```bibtex |
| @misc{xing2025tabledartdynamicadaptivemultimodal, |
| title={TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding}, |
| author={Xiaobo Xing and Wei Yuan and Tong Chen and Quoc Viet Hung Nguyen and Xiangliang Zhang and Hongzhi Yin}, |
| year={2025}, |
| eprint={2509.14671}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2509.14671} |
| } |
| ``` |