Papers
arxiv:2510.17018

CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification

Published on Apr 7

Abstract

A novel recurrent architecture with cosine-similarity feature gating improves toxic text classification under extreme class imbalance, outperforming fine-tuned BERT while using significantly fewer parameters and achieving faster inference.

Toxic text classification for online moderation remains challenging under extreme class imbalance, where rare but high-risk labels such as threat and severe_toxic are consistently underdetected by conventional models. We propose CoGate-LSTM, a parameter-efficient recurrent architecture built around a novel cosine-similarity feature gating mechanism that adaptively rescales token embeddings by their directional similarity to a learned toxicity prototype. Unlike token-position attention, the gate emphasizes feature directions most informative for minority toxic classes. The model combines frozen multi-source embeddings (GloVe, FastText, and BERT-CLS), a character-level BiLSTM, embedding-space SMOTE, and weighted focal loss. On the Jigsaw Toxic Comment benchmark, CoGate-LSTM achieves 0.881 macro-F1 (95% CI: [0.873, 0.889]) and 96.0% accuracy, outperforming fine-tuned BERT by 6.9 macro-F1 points (p < 0.001) and XGBoost by 4.7, while using only 7.3M parameters (about 15times fewer than BERT) and 48 ms CPU inference latency. Gains are strongest on minority labels, with F1 improvements of +71% for severe_toxic, +33% for threat, and +28% for identity_hate relative to fine-tuned BERT. Ablations identify cosine gating as the primary driver of performance (-4.8 macro-F1 when removed), with additional benefits from character-level fusion (-2.4) and multi-head attention (-2.9). CoGate-LSTM also transfers reasonably across datasets, reaching a 0.71 macro-F1 zero-shot on the Contextual Abuse Dataset and 0.73 with lightweight threshold adaptation. These results show that direction-aware feature gating offers an effective and efficient alternative to large, fully fine-tuned transformers for classifying imbalanced toxic comments.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2510.17018
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.17018 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.17018 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.17018 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.