rubert_level2_v2
This model is a fine-tuned version of DeepPavlov/rubert-base-cased for multilabel classification of non-functional software requirements in Russian (Level 2).
It achieves the following results on the evaluation set:
- F1 Micro: 0.9110
- F1 Macro: 0.9110
- F1 Weighted: 0.9120
Model description
Level 2 classifier in a cascaded requirements classification pipeline. Applied only to fragments classified as IsNonFunctional by Level 1. Classifies into 11 non-functional requirement subcategories:
| Label | Description |
|---|---|
Availability (A) |
Uptime, SLA, availability percentage |
Fault Tolerance (FT) |
Failover, recovery, redundancy |
Legal (L) |
Regulatory compliance, standards, licenses |
Look & Feel (LF) |
Visual style, UI design |
Maintainability (MN) |
Code quality, documentation, tech debt |
Operability (O) |
Monitoring, administration, observability |
Performance (PE) |
Response time, throughput, latency |
Portability (PO) |
Platform and OS compatibility |
Scalability (SC) |
Load scaling, growth capacity |
Security (SE) |
Authentication, authorization, encryption |
Usability (US) |
UX, ease of use, learnability |
The model is part of a cascaded pipeline:
Audio → GigaAM-v3 (ASR) → rubert_level1_v2 (L1) → rubert_level2_v2 (L2) → Report
Per-class thresholds are stored in thresholds.json in the eternalGenius/rubert_level1_v2 repository.
Intended uses & limitations
Intended for subclassification of non-functional requirements in Russian extracted from meeting audio recordings. Should only be applied to fragments already classified as IsNonFunctional by Level 1.
Training and evaluation data
Same dataset as Level 1, filtered to IsNonFunctional=1 rows only.
Train: 772 examples | Test: 191 examples per class (11 classes, ~500 examples each).
Training procedure
Training hyperparameters
- learning_rate: 5e-06
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: AdamW with betas=(0.9, 0.999), epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.06
- num_epochs: 15 (early stopping patience=3)
- max_length: 96
Per-class results (test set)
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Availability (A) | 1.000 | 0.939 | 0.968 | 98 |
| Fault Tolerance (FT) | 0.981 | 0.920 | 0.949 | 112 |
| Legal (L) | 0.860 | 0.925 | 0.891 | 106 |
| Look & Feel (LF) | 0.957 | 0.918 | 0.938 | 98 |
| Maintainability (MN) | 0.816 | 0.853 | 0.834 | 109 |
| Operability (O) | 0.976 | 0.883 | 0.927 | 94 |
| Performance (PE) | 0.883 | 0.958 | 0.919 | 118 |
| Portability (PO) | 0.911 | 0.944 | 0.927 | 108 |
| Scalability (SC) | 0.971 | 0.952 | 0.962 | 105 |
| Security (SE) | 0.858 | 0.875 | 0.867 | 104 |
| Usability (US) | 0.831 | 0.841 | 0.836 | 82 |
| micro avg | 0.910 | 0.912 | 0.911 | 1134 |
Framework versions
- Transformers 4.57.1
- PyTorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.2
- Downloads last month
- 74
Model tree for eternalGenius/rubert_level2_v2
Base model
DeepPavlov/rubert-base-casedEvaluation results
- F1 Microself-reported0.911
- F1 Macroself-reported0.911
- F1 Weightedself-reported0.912