rubert_level2_v2

This model is a fine-tuned version of DeepPavlov/rubert-base-cased for multilabel classification of non-functional software requirements in Russian (Level 2).

It achieves the following results on the evaluation set:

F1 Micro: 0.9110
F1 Macro: 0.9110
F1 Weighted: 0.9120

Model description

Level 2 classifier in a cascaded requirements classification pipeline. Applied only to fragments classified as IsNonFunctional by Level 1. Classifies into 11 non-functional requirement subcategories:

Label	Description
`Availability (A)`	Uptime, SLA, availability percentage
`Fault Tolerance (FT)`	Failover, recovery, redundancy
`Legal (L)`	Regulatory compliance, standards, licenses
`Look & Feel (LF)`	Visual style, UI design
`Maintainability (MN)`	Code quality, documentation, tech debt
`Operability (O)`	Monitoring, administration, observability
`Performance (PE)`	Response time, throughput, latency
`Portability (PO)`	Platform and OS compatibility
`Scalability (SC)`	Load scaling, growth capacity
`Security (SE)`	Authentication, authorization, encryption
`Usability (US)`	UX, ease of use, learnability

The model is part of a cascaded pipeline: Audio → GigaAM-v3 (ASR) → rubert_level1_v2 (L1) → rubert_level2_v2 (L2) → Report

Per-class thresholds are stored in thresholds.json in the eternalGenius/rubert_level1_v2 repository.

Intended uses & limitations

Intended for subclassification of non-functional requirements in Russian extracted from meeting audio recordings. Should only be applied to fragments already classified as IsNonFunctional by Level 1.

Training and evaluation data

Same dataset as Level 1, filtered to IsNonFunctional=1 rows only.

Train: 772 examples | Test: 191 examples per class (11 classes, ~500 examples each).

Training procedure

Training hyperparameters

learning_rate: 5e-06
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: AdamW with betas=(0.9, 0.999), epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.06
num_epochs: 15 (early stopping patience=3)
max_length: 96

Per-class results (test set)

Class	Precision	Recall	F1	Support
Availability (A)	1.000	0.939	0.968	98
Fault Tolerance (FT)	0.981	0.920	0.949	112
Legal (L)	0.860	0.925	0.891	106
Look & Feel (LF)	0.957	0.918	0.938	98
Maintainability (MN)	0.816	0.853	0.834	109
Operability (O)	0.976	0.883	0.927	94
Performance (PE)	0.883	0.958	0.919	118
Portability (PO)	0.911	0.944	0.927	108
Scalability (SC)	0.971	0.952	0.962	105
Security (SE)	0.858	0.875	0.867	104
Usability (US)	0.831	0.841	0.836	82
micro avg	0.910	0.912	0.911	1134

Framework versions

Transformers 4.57.1
PyTorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: 74

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for eternalGenius/rubert_level2_v2

Base model

DeepPavlov/rubert-base-cased

Finetuned

(66)

this model

Evaluation results

F1 Micro
self-reported

0.911
F1 Macro
self-reported

0.911
F1 Weighted
self-reported

0.912