Downloads GitHub Likes License: OpenRAIL Model Size

NerGuard-0.3B is a multilingual transformer model for Personally Identifiable Information (PII) detection, built on mDeBERTa-v3-base. It performs token-level classification across 20 PII entity types using BIO tagging, covering names, addresses, government IDs, financial data, and contact information across 8 European languages.

Trained on 500K+ samples from AI4Privacy, it achieves F1-macro 99.63% on in-distribution validation. On the out-of-distribution NVIDIA Nemotron-PII benchmark (1,000 samples, 7-system comparison), the base model ranks 4th out of 7 systems on F1-macro and 3rd on Entity-F1 — without any LLM augmentation. For the full hybrid system with entropy-based LLM routing (which ranks 1st on both F1-macro and F1-micro), see the NerGuard GitHub repository.

Note on labels: The model outputs its native AI4Privacy label space (e.g., GIVENNAME, SURNAME, SOCIALNUM). The NerGuard pipeline includes a semantic alignment layer that maps these to benchmark-specific label spaces (e.g., NVIDIA Nemotron-PII uses first_name, ssn).

Supported Entity Types

Category Entity Types
Person GIVENNAME, SURNAME, TITLE
Location CITY, STREET, BUILDINGNUM, ZIPCODE
Government ID IDCARDNUM, PASSPORTNUM, DRIVERLICENSENUM, SOCIALNUM, TAXNUM
Financial CREDITCARDNUMBER
Contact EMAIL, TELEPHONENUM
Temporal DATE, TIME
Demographic AGE, SEX, GENDER

Evaluation Results

In-Distribution: AI4Privacy (validation split)

Metric Value
F1 (macro) 99.63%
F1 (weighted) 99.33%
Accuracy 99.26%

Out-of-Distribution: NVIDIA Nemotron-PII (1,000 samples)

Tier 2 evaluation: semantic alignment over 16 comparable entity types. Seven systems compared.

System F1-macro F1-micro Entity-F1 Latency (ms)
NerGuard Hybrid V2 (base + LLM) 0.5069 0.7015 0.6634 41
Presidio 0.4933 0.5493 0.6680 86
NerGuard Hybrid V1 0.4943 0.6862 0.6475 31
Piiranha 0.4731 0.6501 0.6195 31
NerGuard Base (this model) 0.4175 0.6105 0.6076 33
spaCy (en_core_web_trf) 0.3607 0.4175 0.5527 144
dslim/bert-base-NER 0.3331 0.4821 0.6225 38

The base model (no LLM) achieves 33 ms median latency. The entropy-gated hybrid adds +8.94 pt F1-macro by routing only uncertain spans (~3% of tokens) to an LLM for disambiguation.

Usage

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="exdsgift/NerGuard-0.3B",
    aggregation_strategy="simple"
)

results = ner("My name is John Smith and my email is john@acme.com")
for entity in results:
    print(f"{entity['word']} -> {entity['entity_group']} ({entity['score']:.2%})")
# John  -> GIVENNAME (99.82%)
# Smith -> SURNAME   (99.71%)
# john@acme.com -> EMAIL (99.54%)

For the full hybrid pipeline with LLM routing and regex validation:

from src.inference.tester import PIITester

tester = PIITester(model_path="exdsgift/NerGuard-0.3B")
entities = tester.get_entities("John Smith, SSN: 078-05-1120, email: john@acme.com")

Training Details

Parameter Value
Base model microsoft/mdeberta-v3-base
Dataset AI4Privacy Open PII Masking 500K
Training samples ~450K
Max sequence length 512 (stride 382)
Learning rate 2e-5
Batch size 32
Epochs 3
Hardware 2× NVIDIA A100

Citation

@mastersthesis{durante2026nerguard,
  title     = {Engineering a Scalable Multilingual PII Detection System
               with mDeBERTa-v3 and LLM-Based Validation},
  author    = {Durante, Gabriele},
  year      = {2026},
  school    = {University of Verona},
  department = {Department of Computer Science}
}
Downloads last month
308
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for exdsgift/NerGuard-0.3B

Finetuned
(256)
this model

Dataset used to train exdsgift/NerGuard-0.3B

Evaluation results

  • F1 (macro) on AI4Privacy Open PII Masking 500K (validation)
    self-reported
    0.996
  • F1 (weighted) on AI4Privacy Open PII Masking 500K (validation)
    self-reported
    0.993
  • Accuracy on AI4Privacy Open PII Masking 500K (validation)
    self-reported
    0.993
  • F1 (macro) on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)
    self-reported
    0.417
  • F1 (micro) on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)
    self-reported
    0.611
  • Entity F1 (span-level) on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)
    self-reported
    0.608
  • Precision on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)
    self-reported
    0.562
  • Recall on NVIDIA Nemotron-PII (1000 samples, Tier 2 eval — 16 aligned entity types)
    self-reported
    0.662