Papers
arxiv:2509.20129

Less is More: The Effectiveness of Compact Typological Language Representations

Published on Sep 24, 2025
Authors:
,

Abstract

A pipeline is presented to optimize high-dimensional and sparse linguistic feature datasets through feature selection and imputation, resulting in compact and interpretable representations that enhance linguistic distance metrics and multilingual NLP performance.

AI-generated summary

Linguistic feature datasets such as URIEL+ are valuable for modelling cross-lingual relationships, but their high dimensionality and sparsity, especially for low-resource languages, limit the effectiveness of distance metrics. We propose a pipeline to optimize the URIEL+ typological feature space by combining feature selection and imputation, producing compact yet interpretable typological representations. We evaluate these feature subsets on linguistic distance alignment and downstream tasks, demonstrating that reduced-size representations of language typology can yield more informative distance metrics and improve performance in multilingual NLP applications.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2509.20129
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.20129 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.20129 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.20129 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.