ModernBERT DGA Detector

This model is designed to classify domains as either legitimate or generated by Domain Generation Algorithms (DGA).

Model Description

Model Type: BERT-based sequence classification
Task: Binary classification (Legitimate vs DGA domains)
Base Model: ModernBERT-base
Training Data: Domain names dataset
Author: Reynier Leyva La O, Carlos A. Catania

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Reynier/modernbert-dga-detector")
model = AutoModelForSequenceClassification.from_pretrained("Reynier/modernbert-dga-detector")

# Example prediction
def predict_domain(domain):
    inputs = tokenizer(domain, return_tensors="pt", max_length=64, truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.softmax(outputs.logits, dim=-1)
        legit_prob = predictions[0][0].item()
        dga_prob = predictions[0][1].item()
    return {"prediction": "DGA" if dga_prob > legit_prob else "LEGITIMATE", 
             "confidence": max(legit_prob, dga_prob)}

# Test examples
domains = ["google.com", "xkvbzpqr.net", "facebook.com", "abcdef123456.com"]
for domain in domains:
    result = predict_domain(domain)
    print(f"{domain} -> {result['prediction']} (confidence: {result['confidence']:.3f})")

Model Architecture

The model is based on ModernBERT and fine-tuned for domain classification:

Input: Domain names (text)
Output: Binary classification (0=LEGITIMATE, 1=DGA)
Max sequence length: 64 tokens

Training Details

This model was fine-tuned on a dataset of legitimate and DGA-generated domains using:

Base model: answerdotai/ModernBERT-base
Framework: Transformers/PyTorch
Task: Binary sequence classification

Performance

Add your model's performance metrics here when available:

Accuracy: 0.9658 ± 0.0153
Precision: 0.9704 ± 0.0253
Recall: 0.9582 ± 0.0147
F1-Score: 0.9579 ± 0.0167
FPR: 0.0267 ± 0.0233
TPR: 0.9582 ± 0.0147
Query Time 0.1226 ± 0.0253 in CPU do not need GPU

Use Cases

Cybersecurity: Detect malicious domains generated by malware
Network Security: Filter potentially harmful domains
Threat Intelligence: Analyze domain patterns in security feeds

Limitations

This model is trained specifically for domain classification
Performance may vary on domains from different TLDs or languages
Regular retraining may be needed as DGA techniques evolve
Model performance depends on the quality and diversity of training data

Citation

If you use this model in your research or applications, please cite it appropriately.

Related Models

Check out the author's other security models:

Llama3_8B-DGA-Detector

Downloads last month: 97

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Reynier/modernbert-dga-detector

Base model

answerdotai/ModernBERT-base

Finetuned

(1298)

this model

Collection including Reynier/modernbert-dga-detector

DGA Multi-Family Benchmark

Collection

8 DGA detection models (CNN, BiLSTM, Bilbo, LABin, Logit, FANCI, DomURLsBERT, ModernBERT) trained on 54 malware families. • 8 items • Updated Mar 25