ReframeBot-Guardrail-DistilBERT
A 3-class DistilBERT classifier for routing ReframeBot user turns:
| Label | Meaning |
|---|---|
TASK_1 |
CBT / academic stress |
TASK_2 |
Crisis / self-harm signal |
TASK_3 |
Out-of-scope |
This version was retrained on data/guardrail_dataset_clean.jsonl, which
merges the original guardrail data with curated hard cases for CBT/Crisis
boundaries, Vietnamese text, pills/overdose language, and OOS work/mental
health informational prompts.
Current System Threshold
The ReframeBot runtime uses the classifier's full probability vector and
routes to TASK_2 when:
P(TASK_2) >= 0.10
after academic-context/follow-up overrides and after the regex + semantic crisis detector has already run.
Evaluation
Hard out-of-domain eval set (data/evaluation_test_data.json, 60 samples):
| Mode | Accuracy | TASK_2 Precision | TASK_2 Recall | TASK_2 F1 |
|---|---|---|---|---|
| Argmax only | 0.9667 | 1.0000 | 0.9048 | 0.9500 |
Tuned P(TASK_2) >= 0.10 |
0.9833 | 0.9545 | 1.0000 | 0.9767 |
Threshold sweep artifact in the project repo:
reports/guardrail_threshold_sweep.csvreports/guardrail_threshold_sweep.png
Usage
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Nhatminh1234/ReframeBot-Guardrail-DistilBERT",
revision="v2-guardrail-clean",
)
classifier("I'm stressed about my final exam")
For full class probabilities:
classifier("I bought pills to overdose", top_k=None)
Safety Note
This classifier is a routing component, not a standalone crisis intervention system. ReframeBot also uses regex + semantic crisis detection and crisis response handling around this model.
- Downloads last month
- 53
Model tree for Nhatminh1234/ReframeBot-Guardrail-DistilBERT
Base model
distilbert/distilbert-base-uncased