nli-popia-v1

Cross-encoder NLI model fine-tuned for deterministic, local, auditable validation of LLM outputs against POPIA (South Africa's Protection of Personal Information Act, 2013) compliance clauses. Drop-in judge for the POPIAJudge in semantix-ai.

Base: cross-encoder/nli-MiniLM2-L6-H768 (~22M params)
Training data: 180 hand-authored triples → labrat-aiko/popia-compliance-nli
Evaluation: 150-pair pinned holdout, hash-verified by release gate
Deployment: 4 INT8 ONNX CPU variants (AVX2 / AVX512 / AVX512-VNNI / ARM64), auto-selected
Inference: ~70 ms mean on CPU, $0 API cost, no internet required
Release gate: macro F1 delta ≥ 0.10 AND no per-clause regression — enforced in CI on every push to master

Results (150-pair pinned holdout)

Fine-tuned checkpoint versus the stock base model:

Metric	Stock NLI	POPIA fine-tune	Delta
Macro F1	0.517	0.813	+0.296
Accuracy	0.707	0.833	+0.127
POPIA consent	0.612	0.902	+0.290
POPIA minimality	0.646	0.753	+0.106
POPIA security safeguards	0.537	0.741	+0.204
POPIA breach notification	0.405	0.790	+0.385
POPIA cross-border transfers	0.542	0.727	+0.185
POPIA general processing	0.400	0.893	+0.493
POPIA data subject rights	0.400	0.883	+0.483

Every clause improved. No regressions. Release gate: PASS.

Quick start

pip install 'semantix-ai[popia]'

from semantix.judges.popia import POPIAJudge
from semantix.presets.popia import POPIA_CONSENT

judge = POPIAJudge()
verdict = judge.evaluate(
    "Your data will be used to generate marketing offers without your explicit opt-in.",
    POPIA_CONSENT.description,
    threshold=POPIAJudge.recommended_threshold,  # 0.75
)
print(verdict.passed, verdict.score)
# False 0.031

Using the raw ONNX directly (no PyTorch, no semantix dependency):

import numpy as np, onnxruntime as ort
from tokenizers import Tokenizer
from huggingface_hub import hf_hub_download

model_path = hf_hub_download("labrat-aiko/nli-popia-v1", "onnx/model_quint8_avx2.onnx")
tok_path   = hf_hub_download("labrat-aiko/nli-popia-v1", "tokenizer.json")
session = ort.InferenceSession(model_path)
tok = Tokenizer.from_file(tok_path)

encoded = tok.encode(
    "Your data will be used to generate marketing offers without opt-in.",
    "The responsible party is obtaining explicit opt-in consent.",
)
logits = session.run(None, {
    "input_ids": np.array([encoded.ids], dtype=np.int64),
    "attention_mask": np.array([encoded.attention_mask], dtype=np.int64),
})[0][0]
probs = np.exp(logits) / np.exp(logits).sum()
# Label order (id2label): {0: contradiction, 1: entailment, 2: neutral}
print("entailment:", probs[1])

Label order (read this before using the scores)

Label indices match the base model's config.id2label:

Index	Label
0	contradiction
1	entailment
2	neutral

POPIAJudge (semantix-ai ≥ 0.2.0) reads probs[1] as the compliance score. A Python-side bug in v0.1.5–v0.1.13 of semantix-ai was reading probs[2] (neutral) — fixed in v0.2.0. The ONNX artefacts in this repo are unchanged; only downstream Python code needed the label index correction.

Files

Path	Purpose
`onnx/model_quint8_avx2.onnx`	INT8 quantized, AVX2 baseline (x86_64)
`onnx/model_qint8_avx512.onnx`	INT8 quantized, AVX512 (x86_64 Skylake+)
`onnx/model_qint8_avx512_vnni.onnx`	INT8 quantized, AVX512-VNNI (x86_64 Cascade Lake+)
`onnx/model_qint8_arm64.onnx`	INT8 quantized, ARM64 (Apple Silicon, Graviton)
`tokenizer.json`	Rust `tokenizers` fast-tokenizer config
`config.json`	Model config (shared across variants)
`eval.jsonl`	Pinned 150-pair holdout (bundled for reproducibility)

POPIAJudge auto-detects the best variant at load time; you never pick manually.

Training

5 epochs, batch size 8, lr 2e-5, warmup 10%, weight decay 0.01
Cross-entropy loss, early stopping on eval_loss against 10% dev split
CPU training on 180 rows: ~6 min
Reproducible via scripts/train_popia.py

Intended use

Compliance-sensitive audit of LLM outputs where a deterministic, auditable, non-API-dependent judge is required (e.g. CI validation, audit trails, POPIA-aware @validate_intent decorators, MCP tools).
Per-clause threshold-tunable gating: each of the 7 POPIA clauses has a pre-tuned F1-optimal threshold shipped in semantix.presets.popia — you can override per-deployment.
Educational / research benchmark for local NLI approaches to compliance judging versus LLM-as-judge pipelines.

Limitations

Does not replace a POPIA specialist, a DPIA, or the Information Regulator. Flags semantic mismatch between text and a clause intent; not a legal opinion.
English-only. No Afrikaans / isiZulu / Sesotho coverage yet.
Narrow domain. 7 canonical clauses out of POPIA's full surface; some adjacent areas (direct marketing under §69, automated decision-making under §71) are not separately modelled.
Hand-authored eval set (same author as train). External validation is welcome and the eval set is Apache 2.0 so you can add your own cases.
Threshold-sensitive. Per-clause F1-optimal thresholds are tuned on this holdout. If your domain's distribution differs (dialect, adversarial inputs, industry jargon), re-tune on your own held-out set — the full recipe is in the semantix-ai repo.

Ecosystem

Training dataset: labrat-aiko/popia-compliance-nli (Apache 2.0)
Library: semantix-ai (MIT), ships POPIAJudge + 7 presets
Source: labrat-akhona/semantix-ai
Writeup: I Fine-Tuned a Compliance Judge and Beat the Stock Model by +29.6pp F1

Citation

@misc{nli-popia-v1-2026,
  author       = {Eland, Akhona},
  title        = {nli-popia-v1: A local NLI cross-encoder fine-tuned for POPIA compliance validation},
  year         = 2026,
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/labrat-aiko/nli-popia-v1}},
  note         = {Apache 2.0}
}

License

Apache 2.0. Commercial and derivative use explicitly permitted. Forking to build sibling compliance models (GDPR, HIPAA, EU AI Act, UK DPA) is actively encouraged — same recipe, different clause corpus. A link back is appreciated but not required.

Downloads last month: 9

Model tree for labrat-aiko/nli-popia-v1

Base model

nreimers/MiniLMv2-L6-H768-distilled-from-RoBERTa-Large

Quantized

cross-encoder/nli-MiniLM2-L6-H768