XDoc: Unified Pre-training for Cross-Format Document Understanding
Paper • 2210.02849 • Published
How to use microsoft/xdoc-base-squad2.0 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("question-answering", model="microsoft/xdoc-base-squad2.0") # Load model directly
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("microsoft/xdoc-base-squad2.0")
model = AutoModelForQuestionAnswering.from_pretrained("microsoft/xdoc-base-squad2.0")# Load model directly
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("microsoft/xdoc-base-squad2.0")
model = AutoModelForQuestionAnswering.from_pretrained("microsoft/xdoc-base-squad2.0")XDoc is a unified pre-trained model that deals with different document formats in a single model. With only 36.7% parameters, XDoc achieves comparable or better performance on downstream tasks, which is cost-effective for real-world deployment.
XDoc: Unified Pre-training for Cross-Format Document Understanding Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei, EMNLP 2022
If you find XDoc helpful, please cite us:
@article{chen2022xdoc,
title={XDoc: Unified Pre-training for Cross-Format Document Understanding},
author={Chen, Jingye and Lv, Tengchao and Cui, Lei and Zhang, Cha and Wei, Furu},
journal={arXiv preprint arXiv:2210.02849},
year={2022}
}
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("question-answering", model="microsoft/xdoc-base-squad2.0")