Social Engineering Detection Model
An intelligent ML model that detects social engineering attacks in text messages, emails, and SMS.
Architecture
Multi-kernel CNN: Embedding(64) β Conv1D(3-gram, 64) + Conv1D(5-gram, 64) β Concat β Dense(64) β Dense(32) β Sigmoid
Total Parameters: 1,323,265 (5.05 MB)
Performance
| Metric | Value |
|---|---|
| Accuracy | 0.9844 |
| AUC | 0.9986 |
| Precision | 0.9854 |
| Recall | 0.9812 |
| Loss | 0.0456 |
Trained for 6 epochs with early stopping.
Training Data
50,112 samples from 3 combined datasets:
- SetFit/enron_spam β 33,716 corporate emails (Enron corpus, ham/spam)
- ucirvine/sms_spam β 5,574 SMS messages (UCI ML Repository)
- Deysi/spam-detection-dataset β ~10,900 modern text samples
Class distribution: ~53% legitimate, ~47% malicious
Files
| File | Description |
|---|---|
social_engineering_detector.keras |
Keras native format (recommended) |
social_engineering_detector.h5 |
HDF5 format (legacy compatible) |
vocabulary.json |
Tokenizer vocabulary (20,000 tokens) |
vectorizer_config.json |
Text vectorization settings |
metrics.json |
Full evaluation metrics |
training_history.json |
Training curves data |
Quick Start
Load the .h5 Model
import tensorflow as tf
import numpy as np
# Load model
model = tf.keras.models.load_model("social_engineering_detector.h5")
# Predict
messages = [
"URGENT: Your account compromised! Click to verify: http://fake-bank.xyz",
"Hey, are we meeting for lunch tomorrow?",
"You won $10,000! Send SSN to claim.",
"Please review the Q3 report attached.",
]
predictions = model.predict(tf.constant(messages))
for msg, pred in zip(messages, predictions):
label = "π¨ SOCIAL ENGINEERING" if pred[0] > 0.5 else "β
LEGITIMATE"
confidence = pred[0] if pred[0] > 0.5 else 1 - pred[0]
print(f"{label} ({confidence:.1%}): {msg}")
Download from HuggingFace Hub
from huggingface_hub import hf_hub_download
import tensorflow as tf
# Download .h5 model
path = hf_hub_download("DhruvSoni/social-engineering-detector", "social_engineering_detector.h5")
model = tf.keras.models.load_model(path)
# Or download .keras model
path = hf_hub_download("DhruvSoni/social-engineering-detector", "social_engineering_detector.keras")
model = tf.keras.models.load_model(path)
What It Detects
- π§ Phishing emails β fake account alerts, credential harvesting
- π± SMS phishing (smishing) β malicious links in text messages
- π° Advance-fee fraud β Nigerian prince scams, lottery wins
- π¦ Brand impersonation β fake PayPal, Apple, Microsoft, bank messages
- β‘ Urgency-based attacks β deadline pressure, account suspension threats
- π Purchase scams β too-good-to-be-true offers, fake products
Model Input/Output
- Input: Raw text string (email body, SMS message, chat message)
- Output: Float between 0 and 1 (>0.5 = social engineering attack, β€0.5 = legitimate)
- Preprocessing: Built-in TextVectorization layer handles tokenization automatically
Training Details
- Framework: TensorFlow 2.21 / Keras 3.14
- Optimizer: Adam (lr=1e-3 with ReduceLROnPlateau)
- Loss: Binary cross-entropy
- Batch size: 128
- Max sequence length: 200 tokens
- Early stopping: On val_auc with patience=4
Confusion Matrix (Test Set)
| Predicted Legitimate | Predicted Malicious | |
|---|---|---|
| Actual Legitimate | 3,962 (TN) | 51 (FP) |
| Actual Malicious | 66 (FN) | 3,438 (TP) |
License
MIT β free for personal and commercial use.
- Downloads last month
- 57