Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality
Abstract
XBridge is a compositional architecture that combines pretrained translation models with large language models to improve multilingual performance while maintaining the LLM's general knowledge processing capabilities.
Large language models (LLMs) exhibit strong general intelligence, yet their multilingual performance remains highly imbalanced. Although LLMs encode substantial cross-lingual knowledge in a unified semantic space, they often struggle to reliably interface this knowledge with low-resource or unseen languages. Fortunately, pretrained encoder-decoder translation models already possess balanced multilingual capability, suggesting a natural complement to LLMs. In this work, we propose XBridge, a compositional encoder-LLM-decoder architecture that offloads multilingual understanding and generation to external pretrained translation models, while preserving the LLM as an English-centric core for general knowledge processing. To address the resulting representation misalignment across models, we introduce lightweight cross-model mapping layers and an optimal transport-based alignment objective, enabling fine-grained semantic consistency for multilingual generation. Experiments on four LLMs across multilingual understanding, reasoning, summarization, and generation indicate that XBridge outperforms strong baselines, especially on low-resource and previously unseen languages, without retraining the LLM.
Community
Our XBridge proposes a new paradigm for multilingual extension, beyond large-scale multilingual training or massive parameter expansion (e.g., MoE). With minimal additional parameters, limited training data, and parameter-efficient training, XBridge brings low-resource and unseen language performance close to that of external NMT models, substantially narrowing the gap across languages, maintaining or improving high-resource language performance, without retraining the LLM!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment (2026)
- Layer-wise Swapping for Generalizable Multilingual Safety (2026)
- Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training (2026)
- Bootstrapping Embeddings for Low Resource Languages (2026)
- Language Steering for Multilingual In-Context Learning (2026)
- Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech (2026)
- Sparse Shortcuts: Facilitating Efficient Fusion in Multimodal Large Language Models (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper