Scam detection β€” NLP (ML project)

Python project layout for training and serving a scam / phishing / coercion text classifier (multilingual can be added later via model/dataset choice).

Layout

scam-nlp-ml/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/          # Original CSVs, dumps, exports (gitignored contents β€” keep samples elsewhere)
β”‚   └── processed/    # Train/val splits, tokenized cache
β”œβ”€β”€ models/           # Checkpoints, exported ONNX/Torch artifacts
β”œβ”€β”€ src/              # Training, evaluation, data pipeline code
β”œβ”€β”€ api/              # Optional FastAPI inference service
β”œβ”€β”€ notebooks/        # EDA and experiments
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
└── README.md

Quick start

  1. Create a virtual environment (Python 3.10+ recommended):

    python -m venv .venv
    
  2. Activate:

    • Windows: .venv\Scripts\activate
    • macOS/Linux: source .venv/bin/activate
  3. Install dependencies:

    pip install -U pip
    pip install -r requirements.txt
    
  4. Environment:

    copy .env.example .env
    # Edit .env with your paths and hyperparameters
    
  5. Place raw datasets under data/raw/, then implement preprocessing in src/ (add modules as you build).

Notes

  • Do not commit secrets or large raw datasets; use .env and optional .gitignore rules for data/raw/* and models/* if needed.
  • For India-focused scams (e.g. digital-arrest SMS), ensure your labels and evaluation reflect those patterns; consider a multilingual encoder (e.g. xlm-roberta-base) when you expand languages.

Next steps (implementation)

  • src/data.py β€” load, clean, split
  • src/train.py β€” fine-tune transformers
  • src/eval.py β€” metrics (precision/recall on scam class)
  • api/main.py β€” POST /predict with text body

This repository scaffold only creates the folders and baseline config; add those modules as you iterate.

Downloads last month
9
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results