ReframeBot-Guardrail-DistilBERT

A 3-class DistilBERT classifier for routing ReframeBot user turns:

Label Meaning
TASK_1 CBT / academic stress
TASK_2 Crisis / self-harm signal
TASK_3 Out-of-scope

This version was retrained on data/guardrail_dataset_clean.jsonl, which merges the original guardrail data with curated hard cases for CBT/Crisis boundaries, Vietnamese text, pills/overdose language, and OOS work/mental health informational prompts.

Current System Threshold

The ReframeBot runtime uses the classifier's full probability vector and routes to TASK_2 when:

P(TASK_2) >= 0.10

after academic-context/follow-up overrides and after the regex + semantic crisis detector has already run.

Evaluation

Hard out-of-domain eval set (data/evaluation_test_data.json, 60 samples):

Mode Accuracy TASK_2 Precision TASK_2 Recall TASK_2 F1
Argmax only 0.9667 1.0000 0.9048 0.9500
Tuned P(TASK_2) >= 0.10 0.9833 0.9545 1.0000 0.9767

Threshold sweep artifact in the project repo:

  • reports/guardrail_threshold_sweep.csv
  • reports/guardrail_threshold_sweep.png

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Nhatminh1234/ReframeBot-Guardrail-DistilBERT",
    revision="v2-guardrail-clean",
)

classifier("I'm stressed about my final exam")

For full class probabilities:

classifier("I bought pills to overdose", top_k=None)

Safety Note

This classifier is a routing component, not a standalone crisis intervention system. ReframeBot also uses regex + semantic crisis detection and crisis response handling around this model.

Downloads last month
53
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nhatminh1234/ReframeBot-Guardrail-DistilBERT

Finetuned
(11759)
this model