NCI Technique Classifier v5.2

Multi-label propaganda technique classifier based on ModernBERT, trained to identify 18 propaganda techniques from the SemEval-2020 Task 11 taxonomy.

Model Description

This model is part of the NCI (Narrative Coordination Index) Protocol for detecting coordinated influence operations. It classifies text into 18 propaganda techniques with well-calibrated probability outputs.

Key Improvements in v5.2

Reduced False Positives: Scientific/factual content false positive rate reduced from 35% (v4) to 8.8%
Better Calibration: ASL loss with clip=0.02 provides more discriminative probability outputs
Hard Negatives Training: Trained on v5 dataset with 1000+ hard negative examples (scientific, business, factual content)
Document-Level Analysis: Works well with full documents, no need for sentence-level splitting

Training Details

Base Model: answerdotai/ModernBERT-base
Dataset: synapti/nci-propaganda-v5 (24,037 samples)
Loss Function: Asymmetric Loss (ASL)
- gamma_neg: 4.0
- gamma_pos: 1.0
- clip: 0.02 (reduced from 0.05 to minimize probability shifting)
Training: 3 epochs, lr=2e-5, batch_size=16
Validation: 4/7 tests passed (57%)

Techniques Detected

ID	Technique	Description
0	Loaded_Language	Words with strong emotional implications
1	Appeal_to_fear-prejudice	Building support through fear or prejudice
2	Exaggeration,Minimisation	Overstating or understating facts
3	Repetition	Repeating messages for reinforcement
4	Flag-Waving	Appealing to patriotism/national identity
5	Name_Calling,Labeling	Using labels to evoke prejudice
6	Reductio_ad_hitlerum	Comparing to Hitler/Nazis
7	Black-and-White_Fallacy	Presenting only two choices
8	Causal_Oversimplification	Assuming single cause for complex issues
9	Whataboutism,Straw_Men,Red_Herring	Deflection techniques
10	Straw_Man	Misrepresenting opponent's position
11	Red_Herring	Introducing irrelevant topics
12	Doubt	Questioning credibility
13	Appeal_to_Authority	Using authority figures to support claims
14	Thought-terminating_Cliches	Phrases that end rational thought
15	Bandwagon	"Everyone is doing it" appeals
16	Slogans	Catchy phrases for memorability
17	Obfuscation,Intentional_Vagueness,Confusion	Deliberately confusing language

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_id = "synapti/nci-technique-classifier-v5.2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "This is OUTRAGEOUS! They are LYING to you. WAKE UP!"

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.sigmoid(outputs.logits)[0]

# Get techniques with probability > 0.5
LABELS = [
    "Loaded_Language", "Appeal_to_fear-prejudice", "Exaggeration,Minimisation",
    "Repetition", "Flag-Waving", "Name_Calling,Labeling", "Reductio_ad_hitlerum",
    "Black-and-White_Fallacy", "Causal_Oversimplification",
    "Whataboutism,Straw_Men,Red_Herring", "Straw_Man", "Red_Herring", "Doubt",
    "Appeal_to_Authority", "Thought-terminating_Cliches", "Bandwagon", "Slogans",
    "Obfuscation,Intentional_Vagueness,Confusion"
]

for i, (label, prob) in enumerate(zip(LABELS, probs)):
    if prob > 0.5:
        print(f"{label}: {prob:.1%}")

Performance

Validation Results

Test Case	v5.2	v4	Status
Pure Propaganda	66.8%	70.8%	✓ Detected
Neutral News	6.9%	5.5%	✓ Clean
SpaceX Factual	3.7%	-	✓ Clean
Multi-Label Propaganda	76.5%	-	✓ Detected
Mixed Content	7.3%	-	-
Fear Appeal	69.9%	-	✓ Detected
Scientific Report	8.8%	35.4%	✓ Clean

Key Metrics

Scientific Report FPR: 8.8% (vs 35% in v4) - 75% reduction
Factual News FPR: 4.6% (vs 29% in v4) - 84% reduction
Propaganda Detection: Maintained (73.7% max confidence on propaganda)

Citation

@inproceedings{da-san-martino-etal-2020-semeval,
    title = "{S}em{E}val-2020 Task 11: Detection of Propaganda Techniques in News Articles",
    author = "Da San Martino, Giovanni and others",
    booktitle = "Proceedings of the 14th International Workshop on Semantic Evaluation",
    year = "2020",
}

License

Apache 2.0

Downloads last month: 620

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for synapti/nci-technique-classifier-v5.2

Base model

answerdotai/ModernBERT-base

Quantized

(26)

this model

synapti
/

nci-technique-classifier-v5.2