NerGuard-0.3B is a multilingual transformer model for Personally Identifiable Information (PII) detection, built on mDeBERTa-v3-base. It performs token-level classification across 20 PII entity types using BIO tagging, covering names, addresses, government IDs, financial data, and contact information across 8 European languages.

Trained on 500K+ samples from AI4Privacy, it achieves F1-macro 99.63% on in-distribution validation. On the out-of-distribution NVIDIA Nemotron-PII benchmark (1,000 samples, 7-system comparison), the base model ranks 4th out of 7 systems on F1-macro and 3rd on Entity-F1 — without any LLM augmentation. For the full hybrid system with entropy-based LLM routing (which ranks 1st on both F1-macro and F1-micro), see the NerGuard GitHub repository.

Note on labels: The model outputs its native AI4Privacy label space (e.g., GIVENNAME, SURNAME, SOCIALNUM). The NerGuard pipeline includes a semantic alignment layer that maps these to benchmark-specific label spaces (e.g., NVIDIA Nemotron-PII uses first_name, ssn).

Supported Entity Types

Category	Entity Types
Person	`GIVENNAME`, `SURNAME`, `TITLE`
Location	`CITY`, `STREET`, `BUILDINGNUM`, `ZIPCODE`
Government ID	`IDCARDNUM`, `PASSPORTNUM`, `DRIVERLICENSENUM`, `SOCIALNUM`, `TAXNUM`
Financial	`CREDITCARDNUMBER`
Contact	`EMAIL`, `TELEPHONENUM`
Temporal	`DATE`, `TIME`
Demographic	`AGE`, `SEX`, `GENDER`

Evaluation Results

In-Distribution: AI4Privacy (validation split)

Metric	Value
F1 (macro)	99.63%
F1 (weighted)	99.33%
Accuracy	99.26%

Out-of-Distribution: NVIDIA Nemotron-PII (1,000 samples)

Tier 2 evaluation: semantic alignment over 16 comparable entity types. Seven systems compared.

System	F1-macro	F1-micro	Entity-F1	Latency (ms)
NerGuard Hybrid V2 (base + LLM)	0.5069	0.7015	0.6634	41
Presidio	0.4933	0.5493	0.6680	86
NerGuard Hybrid V1	0.4943	0.6862	0.6475	31
Piiranha	0.4731	0.6501	0.6195	31
NerGuard Base (this model)	0.4175	0.6105	0.6076	33
spaCy (en_core_web_trf)	0.3607	0.4175	0.5527	144
dslim/bert-base-NER	0.3331	0.4821	0.6225	38

The base model (no LLM) achieves 33 ms median latency. The entropy-gated hybrid adds +8.94 pt F1-macro by routing only uncertain spans (~3% of tokens) to an LLM for disambiguation.

Usage

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="exdsgift/NerGuard-0.3B",
    aggregation_strategy="simple"
)

results = ner("My name is John Smith and my email is john@acme.com")
for entity in results:
    print(f"{entity['word']} -> {entity['entity_group']} ({entity['score']:.2%})")
# John  -> GIVENNAME (99.82%)
# Smith -> SURNAME   (99.71%)
# john@acme.com -> EMAIL (99.54%)

For the full hybrid pipeline with LLM routing and regex validation:

from src.inference.tester import PIITester

tester = PIITester(model_path="exdsgift/NerGuard-0.3B")
entities = tester.get_entities("John Smith, SSN: 078-05-1120, email: john@acme.com")

Training Details

Parameter	Value
Base model	`microsoft/mdeberta-v3-base`
Dataset	AI4Privacy Open PII Masking 500K
Training samples	~450K
Max sequence length	512 (stride 382)
Learning rate	2e-5
Batch size	32
Epochs	3
Hardware	2× NVIDIA A100

Citation

@mastersthesis{durante2026nerguard,
  title     = {Engineering a Scalable Multilingual PII Detection System
               with mDeBERTa-v3 and LLM-Based Validation},
  author    = {Durante, Gabriele},
  year      = {2026},
  school    = {University of Verona},
  department = {Department of Computer Science}
}

Downloads last month: 308

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for exdsgift/NerGuard-0.3B

Base model

microsoft/mdeberta-v3-base

Finetuned

(256)

this model

Dataset used to train exdsgift/NerGuard-0.3B

Evaluation results

F1 (macro) on AI4Privacy Open PII Masking 500K (validation)
self-reported

0.996
F1 (weighted) on AI4Privacy Open PII Masking 500K (validation)
self-reported

0.993
Accuracy on AI4Privacy Open PII Masking 500K (validation)
self-reported

0.993
F1 (macro) on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)
self-reported

0.417
F1 (micro) on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)
self-reported

0.611
Entity F1 (span-level) on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)
self-reported

0.608
Precision on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)
self-reported

0.562
Recall on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)
self-reported

0.662