NerGuard-0.3B is a multilingual transformer model for Personally Identifiable Information (PII) detection, built on mDeBERTa-v3-base. It performs token-level classification across 20 PII entity types using BIO tagging, covering names, addresses, government IDs, financial data, and contact information across 8 European languages.
Trained on 500K+ samples from AI4Privacy, it achieves F1-macro 99.63% on in-distribution validation. On the out-of-distribution NVIDIA Nemotron-PII benchmark (1,000 samples, 7-system comparison), the base model ranks 4th out of 7 systems on F1-macro and 3rd on Entity-F1 — without any LLM augmentation. For the full hybrid system with entropy-based LLM routing (which ranks 1st on both F1-macro and F1-micro), see the NerGuard GitHub repository.
Note on labels: The model outputs its native AI4Privacy label space (e.g.,
GIVENNAME,SURNAME,SOCIALNUM). The NerGuard pipeline includes a semantic alignment layer that maps these to benchmark-specific label spaces (e.g., NVIDIA Nemotron-PII usesfirst_name,ssn).
Supported Entity Types
| Category | Entity Types |
|---|---|
| Person | GIVENNAME, SURNAME, TITLE |
| Location | CITY, STREET, BUILDINGNUM, ZIPCODE |
| Government ID | IDCARDNUM, PASSPORTNUM, DRIVERLICENSENUM, SOCIALNUM, TAXNUM |
| Financial | CREDITCARDNUMBER |
| Contact | EMAIL, TELEPHONENUM |
| Temporal | DATE, TIME |
| Demographic | AGE, SEX, GENDER |
Evaluation Results
In-Distribution: AI4Privacy (validation split)
| Metric | Value |
|---|---|
| F1 (macro) | 99.63% |
| F1 (weighted) | 99.33% |
| Accuracy | 99.26% |
Out-of-Distribution: NVIDIA Nemotron-PII (1,000 samples)
Tier 2 evaluation: semantic alignment over 16 comparable entity types. Seven systems compared.
| System | F1-macro | F1-micro | Entity-F1 | Latency (ms) |
|---|---|---|---|---|
| NerGuard Hybrid V2 (base + LLM) | 0.5069 | 0.7015 | 0.6634 | 41 |
| Presidio | 0.4933 | 0.5493 | 0.6680 | 86 |
| NerGuard Hybrid V1 | 0.4943 | 0.6862 | 0.6475 | 31 |
| Piiranha | 0.4731 | 0.6501 | 0.6195 | 31 |
| NerGuard Base (this model) | 0.4175 | 0.6105 | 0.6076 | 33 |
| spaCy (en_core_web_trf) | 0.3607 | 0.4175 | 0.5527 | 144 |
| dslim/bert-base-NER | 0.3331 | 0.4821 | 0.6225 | 38 |
The base model (no LLM) achieves 33 ms median latency. The entropy-gated hybrid adds +8.94 pt F1-macro by routing only uncertain spans (~3% of tokens) to an LLM for disambiguation.
Usage
from transformers import pipeline
ner = pipeline(
"token-classification",
model="exdsgift/NerGuard-0.3B",
aggregation_strategy="simple"
)
results = ner("My name is John Smith and my email is john@acme.com")
for entity in results:
print(f"{entity['word']} -> {entity['entity_group']} ({entity['score']:.2%})")
# John -> GIVENNAME (99.82%)
# Smith -> SURNAME (99.71%)
# john@acme.com -> EMAIL (99.54%)
For the full hybrid pipeline with LLM routing and regex validation:
from src.inference.tester import PIITester
tester = PIITester(model_path="exdsgift/NerGuard-0.3B")
entities = tester.get_entities("John Smith, SSN: 078-05-1120, email: john@acme.com")
Training Details
| Parameter | Value |
|---|---|
| Base model | microsoft/mdeberta-v3-base |
| Dataset | AI4Privacy Open PII Masking 500K |
| Training samples | ~450K |
| Max sequence length | 512 (stride 382) |
| Learning rate | 2e-5 |
| Batch size | 32 |
| Epochs | 3 |
| Hardware | 2× NVIDIA A100 |
Citation
@mastersthesis{durante2026nerguard,
title = {Engineering a Scalable Multilingual PII Detection System
with mDeBERTa-v3 and LLM-Based Validation},
author = {Durante, Gabriele},
year = {2026},
school = {University of Verona},
department = {Department of Computer Science}
}
- Downloads last month
- 308
Model tree for exdsgift/NerGuard-0.3B
Base model
microsoft/mdeberta-v3-baseDataset used to train exdsgift/NerGuard-0.3B
Evaluation results
- F1 (macro) on AI4Privacy Open PII Masking 500K (validation)self-reported0.996
- F1 (weighted) on AI4Privacy Open PII Masking 500K (validation)self-reported0.993
- Accuracy on AI4Privacy Open PII Masking 500K (validation)self-reported0.993
- F1 (macro) on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)self-reported0.417
- F1 (micro) on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)self-reported0.611
- Entity F1 (span-level) on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)self-reported0.608
- Precision on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)self-reported0.562
- Recall on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)self-reported0.662