Instructions to use microsoft/xdoc-base-websrc with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/xdoc-base-websrc with Transformers:
# Load model directly from transformers import AutoTokenizer, Layoutlmv1ForQuestionAnswering_roberta tokenizer = AutoTokenizer.from_pretrained("microsoft/xdoc-base-websrc") model = Layoutlmv1ForQuestionAnswering_roberta.from_pretrained("microsoft/xdoc-base-websrc") - Notebooks
- Google Colab
- Kaggle
File size: 777 Bytes
f46dd0a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ---
license: mit
---
# XDoc
## Introduction
XDoc is a unified pre-trained model that deals with different document formats in a single model. With only 36.7% parameters, XDoc achieves comparable or better performance on downstream tasks, which is cost-effective for real-world deployment.
[XDoc: Unified Pre-training for Cross-Format Document Understanding](https://arxiv.org/abs/2210.02849)
Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei, [EMNLP 2022](#)
## Citation
If you find XDoc helpful, please cite us:
```
@article{chen2022xdoc,
title={XDoc: Unified Pre-training for Cross-Format Document Understanding},
author={Chen, Jingye and Lv, Tengchao and Cui, Lei and Zhang, Cha and Wei, Furu},
journal={arXiv preprint arXiv:2210.02849},
year={2022}
}
```
|