Text Classification
Transformers
Safetensors
modernbert
propaganda-detection
binary-classification
nci-protocol
text-embeddings-inference
Instructions to use synapti/nci-binary-detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use synapti/nci-binary-detector with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="synapti/nci-binary-detector")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("synapti/nci-binary-detector") model = AutoModelForSequenceClassification.from_pretrained("synapti/nci-binary-detector") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| datasets: | |
| - synapti/nci-propaganda-production | |
| base_model: answerdotai/ModernBERT-base | |
| tags: | |
| - transformers | |
| - modernbert | |
| - text-classification | |
| - propaganda-detection | |
| - binary-classification | |
| - nci-protocol | |
| library_name: transformers | |
| pipeline_tag: text-classification | |
| # NCI Binary Detector | |
| Fast binary classifier that detects whether text contains propaganda techniques. | |
| ## Model Description | |
| This model is **Stage 1** of the NCI (Narrative Credibility Index) two-stage propaganda detection pipeline: | |
| - **Stage 1 (this model)**: Fast binary detection - "Does this text contain propaganda?" | |
| - **Stage 2**: Multi-label technique classification - "Which specific techniques are used?" | |
| The binary detector serves as a fast filter with high recall, passing flagged content to the more detailed technique classifier. | |
| ## Labels | |
| | Label | Description | | |
| |-------|-------------| | |
| | `no_propaganda` | Text does not contain propaganda techniques | | |
| | `has_propaganda` | Text contains one or more propaganda techniques | | |
| ## Performance | |
| **Test Set Results:** | |
| | Metric | Score | | |
| |--------|-------| | |
| | Accuracy | 99.5% | | |
| | F1 Score | 99.6% | | |
| | Precision | 99.2% | | |
| | Recall | 100.0% | | |
| | ROC AUC | 99.9% | | |
| ## Usage | |
| ### Basic Usage | |
| ```python | |
| from transformers import pipeline | |
| detector = pipeline( | |
| "text-classification", | |
| model="synapti/nci-binary-detector" | |
| ) | |
| text = "The radical left is DESTROYING our country!" | |
| result = detector(text)[0] | |
| print(f"Label: {result['label']}") # 'has_propaganda' or 'no_propaganda' | |
| print(f"Confidence: {result['score']:.2%}") | |
| ``` | |
| ### Two-Stage Pipeline | |
| For best results, use with the technique classifier: | |
| ```python | |
| from transformers import pipeline | |
| # Stage 1: Binary detection | |
| detector = pipeline("text-classification", model="synapti/nci-binary-detector") | |
| # Stage 2: Technique classification (only if propaganda detected) | |
| classifier = pipeline("text-classification", model="synapti/nci-technique-classifier", top_k=None) | |
| text = "Your text to analyze..." | |
| # Quick check first | |
| detection = detector(text)[0] | |
| if detection["label"] == "has_propaganda" and detection["score"] > 0.5: | |
| # Detailed technique analysis | |
| techniques = classifier(text)[0] | |
| detected = [t for t in techniques if t["score"] > 0.3] | |
| for t in detected: | |
| print(f"{t['label']}: {t['score']:.2%}") | |
| else: | |
| print("No propaganda detected") | |
| ``` | |
| ## Training Data | |
| Trained on [synapti/nci-propaganda-production](https://huggingface.co/datasets/synapti/nci-propaganda-production): | |
| - **23,000+ examples** from multiple sources | |
| - **Positive examples**: Text with 1+ propaganda techniques (from SemEval-2020, augmented data) | |
| - **Hard negatives**: Factual content from LIAR2, QBias datasets | |
| - **Class-weighted Focal Loss** to handle imbalance (gamma=2.0) | |
| ## Model Architecture | |
| - **Base Model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) | |
| - **Parameters**: 149.6M | |
| - **Max Sequence Length**: 512 tokens | |
| - **Output**: 2 labels (binary classification) | |
| ## Training Details | |
| - **Loss Function**: Focal Loss (gamma=2.0, alpha=0.25) | |
| - **Optimizer**: AdamW | |
| - **Learning Rate**: 2e-5 | |
| - **Batch Size**: 16 (effective 32 with gradient accumulation) | |
| - **Epochs**: 5 with early stopping (patience=3) | |
| - **Hardware**: NVIDIA A10G GPU | |
| ## Limitations | |
| - Trained primarily on English text | |
| - Works best on content similar to training distribution (news articles, social media posts) | |
| - May not detect subtle or novel propaganda techniques not in training data | |
| - Should be used alongside human review for high-stakes applications | |
| ## Related Models | |
| - [synapti/nci-technique-classifier](https://huggingface.co/synapti/nci-technique-classifier) - Stage 2 multi-label technique classifier | |
| ## Citation | |
| ```bibtex | |
| @inproceedings{da-san-martino-etal-2020-semeval, | |
| title = "{S}em{E}val-2020 Task 11: Detection of Propaganda Techniques in News Articles", | |
| author = "Da San Martino, Giovanni and others", | |
| booktitle = "Proceedings of SemEval-2020", | |
| year = "2020", | |
| } | |
| @misc{nci-binary-detector, | |
| author = {NCI Protocol Team}, | |
| title = {NCI Binary Detector}, | |
| year = {2024}, | |
| publisher = {HuggingFace}, | |
| url = {https://huggingface.co/synapti/nci-binary-detector} | |
| } | |
| ``` | |
| ## License | |
| Apache 2.0 | |