lv-mbert-large-conloan-lv-ext

Description

This model is part of a Bachelor's thesis at the University of Latvia: "Contextual approach to Latvian loanword detection: dataset creation and classification experiments".

It is a fine-tuned version of lv-mbert-large-conloan-lv-ext on the extended dataset.

Classes and Labels

Dataset Type: {Baseline (Binary) / Extended (Contrastive)} Labels:

O: Outside
LOAN: Borrowing (Materiālie aizguvumi)
CS: Code-switching (Koda maiņa)
NE: Named Entities (Nosauktās entitātes)

Performance (k-fold average)

F1 Score: {0.897}
Std Dev (σ): {0.006}

Usage

from transformers import pipeline
nlp = pipeline("ner", model="jorenchik/lv-mbert-large-conloan-lv-ext")
nlp("Šodienas mītings bija ļoti produktīvs.")

Downloads last month: 39

Safetensors

Model size

0.4B params

Tensor type

F32

Dataset used to train jorenchik/lv-mbert-large-conloan-lv-ext

Collection including jorenchik/lv-mbert-large-conloan-lv-ext

ConLoan-LV: Latvian Language Loanword Detection

Collection

Datasets and specialized models from 'Contextual approach to Latvian loanword detection: dataset creation and classification experiments' (2026). • 6 items • Updated 9 days ago