rubert_level2_v2

This model is a fine-tuned version of DeepPavlov/rubert-base-cased for multilabel classification of non-functional software requirements in Russian (Level 2).

It achieves the following results on the evaluation set:

  • F1 Micro: 0.9110
  • F1 Macro: 0.9110
  • F1 Weighted: 0.9120

Model description

Level 2 classifier in a cascaded requirements classification pipeline. Applied only to fragments classified as IsNonFunctional by Level 1. Classifies into 11 non-functional requirement subcategories:

Label Description
Availability (A) Uptime, SLA, availability percentage
Fault Tolerance (FT) Failover, recovery, redundancy
Legal (L) Regulatory compliance, standards, licenses
Look & Feel (LF) Visual style, UI design
Maintainability (MN) Code quality, documentation, tech debt
Operability (O) Monitoring, administration, observability
Performance (PE) Response time, throughput, latency
Portability (PO) Platform and OS compatibility
Scalability (SC) Load scaling, growth capacity
Security (SE) Authentication, authorization, encryption
Usability (US) UX, ease of use, learnability

The model is part of a cascaded pipeline: Audio → GigaAM-v3 (ASR) → rubert_level1_v2 (L1) → rubert_level2_v2 (L2) → Report

Per-class thresholds are stored in thresholds.json in the eternalGenius/rubert_level1_v2 repository.

Intended uses & limitations

Intended for subclassification of non-functional requirements in Russian extracted from meeting audio recordings. Should only be applied to fragments already classified as IsNonFunctional by Level 1.

Training and evaluation data

Same dataset as Level 1, filtered to IsNonFunctional=1 rows only.

Train: 772 examples | Test: 191 examples per class (11 classes, ~500 examples each).

Training procedure

Training hyperparameters

  • learning_rate: 5e-06
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: AdamW with betas=(0.9, 0.999), epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.06
  • num_epochs: 15 (early stopping patience=3)
  • max_length: 96

Per-class results (test set)

Class Precision Recall F1 Support
Availability (A) 1.000 0.939 0.968 98
Fault Tolerance (FT) 0.981 0.920 0.949 112
Legal (L) 0.860 0.925 0.891 106
Look & Feel (LF) 0.957 0.918 0.938 98
Maintainability (MN) 0.816 0.853 0.834 109
Operability (O) 0.976 0.883 0.927 94
Performance (PE) 0.883 0.958 0.919 118
Portability (PO) 0.911 0.944 0.927 108
Scalability (SC) 0.971 0.952 0.962 105
Security (SE) 0.858 0.875 0.867 104
Usability (US) 0.831 0.841 0.836 82
micro avg 0.910 0.912 0.911 1134

Framework versions

  • Transformers 4.57.1
  • PyTorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
74
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for eternalGenius/rubert_level2_v2

Finetuned
(66)
this model

Evaluation results