Text Classification
Transformers
Safetensors
English
bert
scam-detection
phishing-detection
cybersecurity
multilingual
Eval Results (legacy)
text-embeddings-inference
Instructions to use aattyy11/scam-nlp-ml with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aattyy11/scam-nlp-ml with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="aattyy11/scam-nlp-ml")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("aattyy11/scam-nlp-ml") model = AutoModelForSequenceClassification.from_pretrained("aattyy11/scam-nlp-ml") - Notebooks
- Google Colab
- Kaggle
Scam detection β NLP (ML project)
Python project layout for training and serving a scam / phishing / coercion text classifier (multilingual can be added later via model/dataset choice).
Layout
scam-nlp-ml/
βββ data/
β βββ raw/ # Original CSVs, dumps, exports (gitignored contents β keep samples elsewhere)
β βββ processed/ # Train/val splits, tokenized cache
βββ models/ # Checkpoints, exported ONNX/Torch artifacts
βββ src/ # Training, evaluation, data pipeline code
βββ api/ # Optional FastAPI inference service
βββ notebooks/ # EDA and experiments
βββ requirements.txt
βββ .env.example
βββ README.md
Quick start
Create a virtual environment (Python 3.10+ recommended):
python -m venv .venvActivate:
- Windows:
.venv\Scripts\activate - macOS/Linux:
source .venv/bin/activate
- Windows:
Install dependencies:
pip install -U pip pip install -r requirements.txtEnvironment:
copy .env.example .env # Edit .env with your paths and hyperparametersPlace raw datasets under
data/raw/, then implement preprocessing insrc/(add modules as you build).
Notes
- Do not commit secrets or large raw datasets; use
.envand optional.gitignorerules fordata/raw/*andmodels/*if needed. - For India-focused scams (e.g. digital-arrest SMS), ensure your labels and evaluation reflect those patterns; consider a multilingual encoder (e.g.
xlm-roberta-base) when you expand languages.
Next steps (implementation)
src/data.pyβ load, clean, splitsrc/train.pyβ fine-tune transformerssrc/eval.pyβ metrics (precision/recall on scam class)api/main.pyβPOST /predictwith text body
This repository scaffold only creates the folders and baseline config; add those modules as you iterate.
- Downloads last month
- 9
Evaluation results
- accuracyself-reported0.976
- f1self-reported0.957