SetFit ModernBERT-base WAMP Router (Optimized ONNX)

This is a specialized SetFit model based on ModernBERT-base, exported to ONNX and optimized for high-performance LLM context pruning.

Key Features

  • 8192 Token Window: Native support for extremely long messages without sliding window overhead.
  • Memory Optimized: Only the last 2 layers of attention (20 & 21) are exported to prevent "Bad Allocation" errors in long contexts.
  • 3-Class Intent Classification: 100% accuracy in routing tasks into Summary, Needle, and Reasoning.
  • Dual Precision: Includes both FP32 (model.onnx) and INT8 Quantized (model_quantized.onnx) versions.

Classification Map

  • Label 0: Summary (Chatter, Recaps, TL;DR)
  • Label 1: Needle (Pinpoint facts, parameters, keys, IPs)
  • Label 2: Reasoning (Comparison, analysis, logical chains)

Project Origin

This model is the primary engine for the WAMP-proxy project.

Usage in WAMP-proxy

Update your .env file:

FILTER_MODEL_DIR=./model_modernbert_onnx
FILTER_MAX_TOKENS=2048 # Can be up to 8192
FILTER_NEEDLE_ALGO=cls_max
FILTER_REASONING_ALGO=max_max
FILTER_SUMMARY_ALGO=max_max

License

MIT

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for naranor/SetFit-ModernBERT-WAMP-V1

Quantized
(26)
this model