Armorer Guard Semantic Classifier
This repository contains the lightweight local semantic classifier artifacts used by Armorer Guard.
License
These model artifacts are public, but they are not free for commercial use.
They are released under the PolyForm Noncommercial License 1.0.0. Noncommercial research, evaluation, personal, educational, and other permitted noncommercial uses are allowed under that license. Commercial use requires a separate paid commercial license from Armorer Labs.
Commercial licensing: dev@armorerlabs.com
See LICENSE.md for the full license text.
Armorer Guard is a local-first scanner for agent inputs, model outputs, and tool calls. The classifier is a TF-IDF linear model trained on Armorer-owned synthetic development data for these semantic categories:
- prompt injection
- system prompt extraction
- data exfiltration
- sensitive data request
- safety bypass
- destructive command
Files
semantic_classifier_native.tsv- Rust-native exported coefficients used by the Armorer Guard binary.semantic_classifier.onnx- ONNX export of the selected model.semantic_classifier.joblib- scikit-learn training artifact for inspection and reproducibility.labels.json- classifier label order.metrics.json- validation metrics for the selected experiment.
Intended Use
Use these artifacts with Armorer Guard or compatible local scanners that need a small, no-network semantic lane for agent safety classification. The model is not a hosted API and does not require inference calls to Hugging Face.
Limitations
This is a lightweight word-ngram linear classifier, not a transformer model. It is intended as one lane in a defense-in-depth scanner alongside deterministic credential detection, policy checks, and context-aware rules.
The classifier can produce false positives on security-adjacent benign text and false negatives on novel obfuscations. Do not use it as the only enforcement mechanism for high-risk systems.