Cyrile
/

EuroBERT-210m-Quality-CL

Text Classification

Model card Files Files and versions

Cyrile commited on Apr 3, 2025

Commit

c5fe49a

·

verified ·

1 Parent(s): 92fef32

Update README.md

Files changed (1) hide show

README.md +16 -16

README.md CHANGED Viewed

@@ -41,34 +41,34 @@ We compare two distinct approaches:
 | Catégorie    | Global (NL + CL) | NL            | CL            |
 |:------------:|:----------------:|:-------------:|:-------------:|
-| **Harmfull** | 0.81             | 0.87          | 0.75          |
-| **Low**      | 0.60             | 0.72          | 0.44          |
-| **Medium**   | 0.60             | 0.74          | 0.49          |
-| **High**     | 0.74             | 0.77          | 0.72          |
-| **Accuracy** | **0.70**         | **0.78**      | **0.62**      |
 - **f1-score: Separate Models**
 | Catégorie    | Global (NL + CL) | NL            | CL            |
 |:------------:|:----------------:|:-------------:|:-------------:|
-| **Harmfull** | 0.83             | 0.89          | 0.78          |
-| **Low**      | 0.59             | 0.71          | 0.46          |
-| **Medium**   | 0.63             | 0.77          | 0.49          |
-| **High**     | 0.76             | 0.79          | 0.73          |
-| **Accuracy** | **0.71**         | **0.80**      | **0.63**      |
 ## Key Performance Metrics:
 - **Unified Model (NL + CL)**:
-  - Overall accuracy: ~69%
-  - High reliability on harmful data (f1-score: 0.81)
 - **Separate Models**:
-  - **Natural Language (NL)**: ~79% accuracy
-    - Excellent performance on harmful data (f1-score: 0.89)
-  - **Code Language (CL)**: ~63% accuracy
-    - Good detection of harmful data (f1-score: 0.78)
 ## Training Dataset:
 - Public dataset available: [TempestTeam/dataset-quality](https://huggingface.co/datasets/TempestTeam/dataset-quality)

 | Catégorie    | Global (NL + CL) | NL            | CL            |
 |:------------:|:----------------:|:-------------:|:-------------:|
+| **Harmfull** | 0.86             | 0.93          | 0.79          |
+| **Low**      | 0.62             | 0.81          | 0.40          |
+| **Medium**   | 0.63             | 0.78          | 0.50          |
+| **High**     | 0.77             | 0.81          | 0.74          |
+| **Accuracy** | **0.73**         | **0.83**      | **0.62**      |
 - **f1-score: Separate Models**
 | Catégorie    | Global (NL + CL) | NL            | CL            |
 |:------------:|:----------------:|:-------------:|:-------------:|
+| **Harmfull** | 0.83             | 0.93          | 0.72          |
+| **Low**      | 0.64             | 0.76          | 0.53          |
+| **Medium**   | 0.63             | 0.76          | 0.52          |
+| **High**     | 0.79             | 0.81          | 0.76          |
+| **Accuracy** | **0.73**         | **0.82**      | **0.63**      |
 ## Key Performance Metrics:
 - **Unified Model (NL + CL)**:
+  - Overall accuracy: ~73%
+  - High reliability on harmful data (f1-score: 0.86)
 - **Separate Models**:
+  - **Natural Language (NL)**: ~82% accuracy
+    - Excellent performance on harmful data (f1-score: 0.93)
+  - **Code Language (CL)**: ~63% accuracy
+    - Good detection of harmful data (f1-score: 0.72)
 ## Training Dataset:
 - Public dataset available: [TempestTeam/dataset-quality](https://huggingface.co/datasets/TempestTeam/dataset-quality)