nli-popia-v1

Cross-encoder NLI model fine-tuned for deterministic, local, auditable validation of LLM outputs against POPIA (South Africa's Protection of Personal Information Act, 2013) compliance clauses. Drop-in judge for the POPIAJudge in semantix-ai.

  • Base: cross-encoder/nli-MiniLM2-L6-H768 (~22M params)
  • Training data: 180 hand-authored triples → labrat-aiko/popia-compliance-nli
  • Evaluation: 150-pair pinned holdout, hash-verified by release gate
  • Deployment: 4 INT8 ONNX CPU variants (AVX2 / AVX512 / AVX512-VNNI / ARM64), auto-selected
  • Inference: ~70 ms mean on CPU, $0 API cost, no internet required
  • Release gate: macro F1 delta ≥ 0.10 AND no per-clause regression — enforced in CI on every push to master

Results (150-pair pinned holdout)

Fine-tuned checkpoint versus the stock base model:

Metric Stock NLI POPIA fine-tune Delta
Macro F1 0.517 0.813 +0.296
Accuracy 0.707 0.833 +0.127
POPIA consent 0.612 0.902 +0.290
POPIA minimality 0.646 0.753 +0.106
POPIA security safeguards 0.537 0.741 +0.204
POPIA breach notification 0.405 0.790 +0.385
POPIA cross-border transfers 0.542 0.727 +0.185
POPIA general processing 0.400 0.893 +0.493
POPIA data subject rights 0.400 0.883 +0.483

Every clause improved. No regressions. Release gate: PASS.

Quick start

pip install 'semantix-ai[popia]'
from semantix.judges.popia import POPIAJudge
from semantix.presets.popia import POPIA_CONSENT

judge = POPIAJudge()
verdict = judge.evaluate(
    "Your data will be used to generate marketing offers without your explicit opt-in.",
    POPIA_CONSENT.description,
    threshold=POPIAJudge.recommended_threshold,  # 0.75
)
print(verdict.passed, verdict.score)
# False 0.031

Using the raw ONNX directly (no PyTorch, no semantix dependency):

import numpy as np, onnxruntime as ort
from tokenizers import Tokenizer
from huggingface_hub import hf_hub_download

model_path = hf_hub_download("labrat-aiko/nli-popia-v1", "onnx/model_quint8_avx2.onnx")
tok_path   = hf_hub_download("labrat-aiko/nli-popia-v1", "tokenizer.json")
session = ort.InferenceSession(model_path)
tok = Tokenizer.from_file(tok_path)

encoded = tok.encode(
    "Your data will be used to generate marketing offers without opt-in.",
    "The responsible party is obtaining explicit opt-in consent.",
)
logits = session.run(None, {
    "input_ids": np.array([encoded.ids], dtype=np.int64),
    "attention_mask": np.array([encoded.attention_mask], dtype=np.int64),
})[0][0]
probs = np.exp(logits) / np.exp(logits).sum()
# Label order (id2label): {0: contradiction, 1: entailment, 2: neutral}
print("entailment:", probs[1])

Label order (read this before using the scores)

Label indices match the base model's config.id2label:

Index Label
0 contradiction
1 entailment
2 neutral

POPIAJudge (semantix-ai ≥ 0.2.0) reads probs[1] as the compliance score. A Python-side bug in v0.1.5–v0.1.13 of semantix-ai was reading probs[2] (neutral) — fixed in v0.2.0. The ONNX artefacts in this repo are unchanged; only downstream Python code needed the label index correction.

Files

Path Purpose
onnx/model_quint8_avx2.onnx INT8 quantized, AVX2 baseline (x86_64)
onnx/model_qint8_avx512.onnx INT8 quantized, AVX512 (x86_64 Skylake+)
onnx/model_qint8_avx512_vnni.onnx INT8 quantized, AVX512-VNNI (x86_64 Cascade Lake+)
onnx/model_qint8_arm64.onnx INT8 quantized, ARM64 (Apple Silicon, Graviton)
tokenizer.json Rust tokenizers fast-tokenizer config
config.json Model config (shared across variants)
eval.jsonl Pinned 150-pair holdout (bundled for reproducibility)

POPIAJudge auto-detects the best variant at load time; you never pick manually.

Training

  • 5 epochs, batch size 8, lr 2e-5, warmup 10%, weight decay 0.01
  • Cross-entropy loss, early stopping on eval_loss against 10% dev split
  • CPU training on 180 rows: ~6 min
  • Reproducible via scripts/train_popia.py

Intended use

  • Compliance-sensitive audit of LLM outputs where a deterministic, auditable, non-API-dependent judge is required (e.g. CI validation, audit trails, POPIA-aware @validate_intent decorators, MCP tools).
  • Per-clause threshold-tunable gating: each of the 7 POPIA clauses has a pre-tuned F1-optimal threshold shipped in semantix.presets.popia — you can override per-deployment.
  • Educational / research benchmark for local NLI approaches to compliance judging versus LLM-as-judge pipelines.

Limitations

  • Does not replace a POPIA specialist, a DPIA, or the Information Regulator. Flags semantic mismatch between text and a clause intent; not a legal opinion.
  • English-only. No Afrikaans / isiZulu / Sesotho coverage yet.
  • Narrow domain. 7 canonical clauses out of POPIA's full surface; some adjacent areas (direct marketing under §69, automated decision-making under §71) are not separately modelled.
  • Hand-authored eval set (same author as train). External validation is welcome and the eval set is Apache 2.0 so you can add your own cases.
  • Threshold-sensitive. Per-clause F1-optimal thresholds are tuned on this holdout. If your domain's distribution differs (dialect, adversarial inputs, industry jargon), re-tune on your own held-out set — the full recipe is in the semantix-ai repo.

Ecosystem

Citation

@misc{nli-popia-v1-2026,
  author       = {Eland, Akhona},
  title        = {nli-popia-v1: A local NLI cross-encoder fine-tuned for POPIA compliance validation},
  year         = 2026,
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/labrat-aiko/nli-popia-v1}},
  note         = {Apache 2.0}
}

License

Apache 2.0. Commercial and derivative use explicitly permitted. Forking to build sibling compliance models (GDPR, HIPAA, EU AI Act, UK DPA) is actively encouraged — same recipe, different clause corpus. A link back is appreciated but not required.

Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for labrat-aiko/nli-popia-v1

Dataset used to train labrat-aiko/nli-popia-v1

Space using labrat-aiko/nli-popia-v1 1