Armorer Guard Semantic Classifier

This repository contains the lightweight local semantic classifier artifacts used by Armorer Guard.

License

These model artifacts are public, but they are not free for commercial use.

They are released under the PolyForm Noncommercial License 1.0.0. Noncommercial research, evaluation, personal, educational, and other permitted noncommercial uses are allowed under that license. Commercial use requires a separate paid commercial license from Armorer Labs.

Commercial licensing: dev@armorerlabs.com

See LICENSE.md for the full license text.

Armorer Guard is a local-first scanner for agent inputs, model outputs, and tool calls. The classifier is a TF-IDF linear model trained on Armorer-owned synthetic development data for these semantic categories:

prompt injection
system prompt extraction
data exfiltration
sensitive data request
safety bypass
destructive command

Files

semantic_classifier_native.tsv - Rust-native exported coefficients used by the Armorer Guard binary.
semantic_classifier.onnx - ONNX export of the selected model.
semantic_classifier.joblib - scikit-learn training artifact for inspection and reproducibility.
labels.json - classifier label order.
metrics.json - validation metrics for the selected experiment.

Intended Use

Use these artifacts with Armorer Guard or compatible local scanners that need a small, no-network semantic lane for agent safety classification. The model is not a hosted API and does not require inference calls to Hugging Face.

Limitations

This is a lightweight word-ngram linear classifier, not a transformer model. It is intended as one lane in a defense-in-depth scanner alongside deterministic credential detection, policy checks, and context-aware rules.

The classifier can produce false positives on security-adjacent benign text and false negatives on novel obfuscations. Do not use it as the only enforcement mechanism for high-risk systems.

Downloads last month: -; Downloads are not tracked for this model. How to track

Space using armorer-labs/armorer-guard-semantic-classifier 1

Collection including armorer-labs/armorer-guard-semantic-classifier

Agent Safety and Prompt Injection Guardrails

Collection

Curated papers, models, datasets, and demos for AI-agent runtime safety, prompt injection, MCP security, and tool-call guardrails. • 8 items • Updated 1 day ago • 1