distilbert-multitask

Multi-task DistilBERT classifier for conversational AI pipelines in interactive fiction and games. Performs two classification tasks in a single forward pass:

Task Output Notes
Dialogue act 21-class label Classifies player utterance type
Manipulation detection Binary probability Detects prompt injection / NPC takeover attempts

Dialogue Act Labels

accusation, acknowledgment, action, agree, command, conditional, confession, disagree, emote, farewell, flirt, greeting, hedge, hostile, intent, offer, opinion, out_of_character, question, statement, yes_no_question

Model Details

  • Base model: distilbert-base-uncased
  • Format: ONNX (CPU inference via onnxruntime)
  • Inference time: ~10–15ms per input on CPU
  • Input: Single sentence (player utterance in a conversation)
  • Output: Dialogue act label + manipulation detection probability

Usage

import json
import numpy as np
import onnxruntime
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer

snapshot_download(repo_id="myemfar/distilbert-multitask", local_dir="./distilbert_multitask")

session = onnxruntime.InferenceSession("./distilbert_multitask/model.onnx")
tokenizer = AutoTokenizer.from_pretrained("./distilbert_multitask")

with open("./distilbert_multitask/label_map_da.json") as f:
    labels = {int(k): v for k, v in json.load(f).items()}

inputs = tokenizer("Where is the tavern?", return_tensors="np")
logits_da, logits_manip = session.run(None, dict(inputs))

da_label = labels[int(np.argmax(logits_da))]           # "question"
manip_prob = float(1 / (1 + np.exp(-logits_manip[0][0])))  # sigmoid

Training Data

Dialogue Act Classification

Synthetic training data generated via Claude across 21 conversational categories, curated for interactive fiction and RPG dialogue contexts. Approximately 2,000 labeled examples with targeted augmentation at category boundaries.

Manipulation / Prompt Injection Detection

Fine-tuned on a combination of three public datasets plus domain-specific negative examples (in-character RPG dialogue):

Dataset License Description
deepset/prompt-injections CC BY 4.0 Benign queries + prompt injection examples
hackaprompt/hackaprompt-dataset Apache 2.0 Red-teaming competition submissions
lakera-ai/gandalf_ignore_instructions CC BY 4.0 Instruction-override attempts from Lakera's Gandalf challenge
Downloads last month
33
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for myemfar/distilbert-multitask

Quantized
(45)
this model

Datasets used to train myemfar/distilbert-multitask