modernbert-primary_topic

Fine-tuned ModernBERT-base classifier that assigns a primary legal topic to a multi-turn conversation.

Part of the Legal QA collection · Try the interactive demo →

Model description

Stage 2 of a two-model encoder routing pipeline:

Stage Model Input Output
1 modernbert-seeks_guidance Full conversation seeks_legal_guidance (True/False)
2 modernbert-primary_topic User turns only Primary topic (14 labels + non-guidance)

modernbert-primary_topic predicts one of 14 legal topic labels, plus a (non-guidance) class for conversations where the user is not seeking legal help. In practice, run modernbert-seeks_guidance first and only trust the topic label when it predicts True.

Input preprocessing: user messages only, serialized as User: content lines (assistant turns are dropped).

Topic taxonomy

Topic Description
FAMILY Marriage, divorce, child custody, child support, alimony, adoption, guardianship, domestic violence, parentage, family-status disputes.
HOUSING Rent, eviction, landlord-tenant disputes, habitability, deposits, mortgages, foreclosure, neighbors, housing subsidies.
WORK Employment contracts, wages, dismissal, discrimination at work, leave, workplace safety, severance, freelancers when the main issue is labor rights.
PUBLIC_BENEFITS Unemployment benefits, disability, pensions, welfare, public assistance, eligibility, reductions, sanctions, appeals on benefits.
CRIMINAL_JUSTICE Police, arrest, criminal charges, fines, prosecution, defense, victims' rights, probation, criminal procedure.
CONSUMER_DEBT Purchases, warranties, subscriptions, refunds, scams, debt collection, loans, bankruptcy, credit, repossession, consumer finance.
CONTRACTS Private civil agreements and breach/interpretation issues not better covered by work, housing, consumer, or business.
IMMIGRATION Visas, residence permits, asylum, citizenship, deportation, family migration, immigration status and related procedures.
BUSINESS Company formation, shareholder issues, commercial compliance, business operations, B2B disputes, self-employment when the main issue is business law.
DATA_PRIVACY Personal data, surveillance, GDPR/privacy rights, data deletion, consent, monitoring, platform data practices.
INTELLECTUAL_PROPERTY Copyright, trademark, patent, trade secrets, licensing, infringement, ownership of creative or technical works.
CIVIL_RIGHTS Discrimination outside employment/housing, free speech, due process, equal treatment, constitutional or human-rights style claims.
INTERNATIONAL_CROSS_BORDER Choice of law, jurisdiction, treaty-based questions, cross-border enforcement, multi-country disputes where cross-border law is central.
OTHER Genuinely legal but not covered above.

Selection rules: use CONTRACTS only when the issue is mainly about a civil agreement and is not better captured by WORK, HOUSING, CONSUMER_DEBT, or BUSINESS. Use INTERNATIONAL_CROSS_BORDER only when the cross-border or jurisdictional aspect is central, not merely incidental.

Results

Split N Accuracy Precision Recall F1
Validation (best checkpoint) 106 77.36% 76.81% 77.36% 76.64%
Test (held-out) 107 76.64% 78.62% 76.64% 76.28%

Usage

Pipeline (recommended)

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

def serialize(messages, input_mode="full"):
    lines = []
    for msg in messages:
        role = msg["role"]
        if input_mode == "user" and role != "user":
            continue
        lines.append(f"{role.capitalize()}: {msg['content']}")
    return "\n".join(lines)

def predict(model_id, text):
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForSequenceClassification.from_pretrained(model_id)
    enc = tokenizer(text, truncation=True, max_length=4096, return_tensors="pt")
    with torch.no_grad():
        pred_id = model(**enc).logits.argmax(dim=-1).item()
    label = model.config.id2label[str(pred_id)]
    return label or "(non-guidance)"

conversation = [
    {"role": "user", "content": "Can my landlord evict me without notice?"},
    {"role": "assistant", "content": "Eviction rules depend on your jurisdiction..."},
    {"role": "user", "content": "I'm in California on a month-to-month lease."},
]

topic = predict(
    "AmirMohseni/modernbert-primary_topic",
    serialize(conversation, input_mode="user"),
)
print(topic)  # e.g. HOUSING

Pair with modernbert-seeks_guidance for the full routing pipeline (see that model card for a complete example).

Single model

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="AmirMohseni/modernbert-primary_topic",
)
text = "User: Can my landlord evict me?\nUser: I'm in California on a month-to-month lease."
print(classifier(text))

Intended uses & limitations

Use for: assigning a topic label to user queries already flagged as seeking legal guidance.

Do not use for: legal advice, or as a standalone filter for legal intent (use the seeks_guidance model first).

Caveats: silver labels from GPT-5.4; user-turn-only input discards assistant context; English only.

Training data

Dataset: AmirMohseni/WildChat-Legal-Classification-V2-Balanced

  • Balanced legal / non-legal rows from WildChat-1M with GPT-5.4 structured labels
  • Splits: train 1909 · val 106 · test 107
  • Target field: primary_topic (empty for non-guidance rows)

Training procedure

Setting Value
Base model answerdotai/ModernBERT-base
Input mode User turns only
Max length 4096
Learning rate 5e-5
Epochs 8
Effective batch size 64 (8 × 8 grad accum)
Best checkpoint Highest weighted F1 on validation
Full training log
Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
12.5926 0.33 10 1.7831 0.4811 0.2359 0.4811 0.3166
8.7973 1.0 30 1.1594 0.6981 0.6908 0.6981 0.6815
5.7850 1.33 40 0.9966 0.7358 0.7646 0.7358 0.7405
2.2375 3.0 90 0.8301 0.7170 0.7588 0.7170 0.7290
0.0142 6.67 200 0.8931 0.7736 0.7681 0.7736 0.7664
0.0036 8.0 240 0.9106 0.7736 0.7720 0.7736 0.7650

Framework versions

  • Transformers 5.8.1 · PyTorch 2.10.0 · Datasets 4.8.5
Downloads last month
476
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AmirMohseni/modernbert-primary_topic

Finetuned
(1273)
this model

Space using AmirMohseni/modernbert-primary_topic 1

Collection including AmirMohseni/modernbert-primary_topic

Evaluation results