Update model card with complete documentation

fb589f4 verified 5 months ago

4.26 kB

	---
	license: apache-2.0
	datasets:
	- synapti/nci-propaganda-production
	base_model: answerdotai/ModernBERT-base
	tags:
	- transformers
	- modernbert
	- text-classification
	- propaganda-detection
	- binary-classification
	- nci-protocol
	library_name: transformers
	pipeline_tag: text-classification
	---

	# NCI Binary Detector

	Fast binary classifier that detects whether text contains propaganda techniques.

	## Model Description

	This model is Stage 1 of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline:

	- Stage 1 (this model): Fast binary detection - "Does this text contain propaganda?"
	- Stage 2: Multi-label technique classification - "Which specific techniques are used?"

	The binary detector serves as a fast filter with high recall, passing flagged content to the more detailed technique classifier.

	## Labels

	\| Label \| Description \|
	\|-------\|-------------\|
	\| `no_propaganda` \| Text does not contain propaganda techniques \|
	\| `has_propaganda` \| Text contains one or more propaganda techniques \|

	## Performance

	Test Set Results:

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| 99.5% \|
	\| F1 Score \| 99.6% \|
	\| Precision \| 99.2% \|
	\| Recall \| 100.0% \|
	\| ROC AUC \| 99.9% \|

	## Usage

	### Basic Usage

	```python
	from transformers import pipeline

	detector = pipeline(
	"text-classification",
	model="synapti/nci-binary-detector"
	)

	text = "The radical left is DESTROYING our country!"
	result = detector(text)[0]

	print(f"Label: {result['label']}") # 'has_propaganda' or 'no_propaganda'
	print(f"Confidence: {result['score']:.2%}")
	```

	### Two-Stage Pipeline

	For best results, use with the technique classifier:

	```python
	from transformers import pipeline

	# Stage 1: Binary detection
	detector = pipeline("text-classification", model="synapti/nci-binary-detector")

	# Stage 2: Technique classification (only if propaganda detected)
	classifier = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None)

	text = "Your text to analyze..."

	# Quick check first
	detection = detector(text)[0]
	if detection["label"] == "has_propaganda" and detection["score"] > 0.5:
	# Detailed technique analysis
	techniques = classifier(text)[0]
	detected = [t for t in techniques if t["score"] > 0.3]
	for t in detected:
	print(f"{t['label']}: {t['score']:.2%}")
	else:
	print("No propaganda detected")
	```

	## Training Data

	Trained on [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production):

	- 23,000+ examples from multiple sources
	- Positive examples: Text with 1+ propaganda techniques (from SemEval-2020, augmented data)
	- Hard negatives: Factual content from LIAR2, QBias datasets
	- Class-weighted Focal Loss to handle imbalance (gamma=2.0)

	## Model Architecture

	- Base Model: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
	- Parameters: 149.6M
	- Max Sequence Length: 512 tokens
	- Output: 2 labels (binary classification)

	## Training Details

	- Loss Function: Focal Loss (gamma=2.0, alpha=0.25)
	- Optimizer: AdamW
	- Learning Rate: 2e-5
	- Batch Size: 16 (effective 32 with gradient accumulation)
	- Epochs: 5 with early stopping (patience=3)
	- Hardware: NVIDIA A10G GPU

	## Limitations

	- Trained primarily on English text
	- Works best on content similar to training distribution (news articles, social media posts)
	- May not detect subtle or novel propaganda techniques not in training data
	- Should be used alongside human review for high-stakes applications

	## Related Models

	- [synapti/nci-technique-classifier](https://huggingface.co/synapti/nci-technique-classifier) - Stage 2 multi-label technique classifier

	## Citation

	```bibtex
	@inproceedings{da-san-martino-etal-2020-semeval,
	title = "{S}em{E}val-2020 Task 11: Detection of Propaganda Techniques in News Articles",
	author = "Da San Martino, Giovanni and others",
	booktitle = "Proceedings of SemEval-2020",
	year = "2020",
	}

	@misc{nci-binary-detector,
	author = {NCI Protocol Team},
	title = {NCI Binary Detector},
	year = {2024},
	publisher = {HuggingFace},
	url = {https://huggingface.co/synapti/nci-binary-detector}
	}
	```

	## License

	Apache 2.0