CyberPuppy v6 — Bilingual Text LoRA (LoRA-A)

中文網路霸凌偵測 v6 · Qwen3-8B + LoRA r=64 + 4-head + 雙語 + CNTP 對抗訓練

Updated v6 with doubled LoRA capacity (r=64) and 5-epoch training for stronger phonetic robustness on systematic homophonic perturbations. Part of dual-LoRA ensemble with v6-pinyin-lora.

What changed from v5

v5 v6
LoRA rank 32 64 (doubled capacity)
LoRA alpha 64 128
Training epochs 3 5 (with early-stopping at best)
Max length 128/192 192 (consistent)
Consistency loss λ=0.5 λ=0.5
Best F1 (dev) 0.8440 0.8435 (peaked at epoch 1)

Note: Both v5.1 and v6 best checkpoints peak at epoch 1-2 — additional capacity helps downstream test sets (especially HED-COLD), not standalone dev F1.

Performance (dual-LoRA ensemble v6 + α=0.60 + dec_thresh=0.52)

Benchmark v5.1.1 v6.0 Δ
COLD test F1_w 0.8302 0.8321 +0.19pt
PCR-ToxiCN (real-world) 0.7162 0.7119 −0.43pt
ToxiCloakCN homo abs F1 0.8496 0.8510 +0.14pt
HED-COLD (homophone) 0.9126 0.9317 +1.91pt ✓✓
6 Traditional Chinese threats 6/6 6/6
Average 0.8272 0.8317 +0.45pt

PCR-ToxiCN remains world-class: v6.0's 0.7119 still exceeds published SOTA 0.672 by +3.99pt (vs v5.1.1's +4.42pt). Trade-off accepted for HED-COLD breakthrough.

Quick Start

import torch, re
from peft import PeftModel
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
from pypinyin import pinyin, Style

device = torch.device("cuda")
dtype = torch.bfloat16
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B-Base")

def load_branch(repo_id):
    base = AutoModel.from_pretrained("Qwen/Qwen3-8B-Base", torch_dtype=dtype, device_map=device)
    model = PeftModel.from_pretrained(base, repo_id, subfolder="lora")
    model.eval()
    # heads.pt: nn.ModuleDict({task: nn.Linear(H, dim)}) state_dict
    heads_state = torch.load(hf_hub_download(repo_id, "heads.pt"),
                             map_location=device, weights_only=False)["heads"]
    W_tox = heads_state["toxicity.weight"].to(device=device, dtype=dtype)  # (3, H)
    b_tox = heads_state["toxicity.bias"].to(device=device, dtype=dtype)    # (3,)
    return model, W_tox, b_tox

model_t, W_t, b_t = load_branch("thc1006/cyberpuppy-v6-bilingual")
model_p, W_p, b_p = load_branch("thc1006/cyberpuppy-v6-pinyin-lora")

_HAN = re.compile(r"[㐀-䶿一-鿿豈-﫿]")
def to_pinyin(text):
    return " ".join(pinyin(ch, style=Style.NORMAL)[0][0] if _HAN.match(ch) else ch
                    for ch in text if ch.strip())

@torch.inference_mode()
def predict(text, alpha=0.60, dec_thresh=0.52):
    enc_t = tok(text, return_tensors="pt", truncation=True, max_length=192).to(device)
    enc_p = tok(to_pinyin(text), return_tensors="pt", truncation=True, max_length=192).to(device)
    h_t = model_t(**enc_t).last_hidden_state[:, -1]
    h_p = model_p(**enc_p).last_hidden_state[:, -1]
    logits_t = (h_t @ W_t.T + b_t).float()
    logits_p = (h_p @ W_p.T + b_p).float()
    text_probs   = logits_t.softmax(-1)
    pinyin_probs = logits_p.softmax(-1)
    # v6.0 strategy: α=0.60 geo-mean, no pinyin override, decision_thresh=0.52
    ens = (text_probs ** alpha) * (pinyin_probs ** (1 - alpha))
    ens = ens / ens.sum(-1, keepdim=True)
    toxic_prob = (ens[:, 1] + ens[:, 2]).item()
    return "toxic" if toxic_prob > dec_thresh else "none", toxic_prob

label, p = predict("你這個笨蛋,滾開!")
print(f"{label} (toxic_prob={p:.3f})")  # toxic

Training Details

Parameter Value
Base model Qwen/Qwen3-8B-Base
LoRA rank 64
LoRA alpha 128
Target modules All linear (q/k/v/o/gate/up/down)
Training data 179,186 samples (v5 bilingual)
Epochs 5
Best epoch 1 (step 9954)
Learning rate 3e-5
Batch size 6 × 6 gradient accumulation
Max length 192 tokens
Precision bf16
Loss Focal (γ=2.5) + uncertainty multi-task + consistency (λ=0.5)
Hardware 1× NVIDIA RTX 5090 (32GB, 590W OC)

Limitations

  1. PCR-ToxiCN slightly lower than v5.1.1 (0.7119 vs 0.7162). For maximum real-world robustness, you can use α=0.5 + pinyin override 0.7 + thresh 0.48 instead, getting PCR 0.7283 but COLD drops to 0.8198.
  2. English input: Out of distribution.
  3. Cultural / semantic attacks: e.g., 動物園貓孝子, 淺草寺吧 — these require world knowledge, not phonetics. ~30-40 of PCR's 250 toxic samples remain hard.
  4. Numeric / letter substitution attacks (G8, S13, 64.5克黄金=250) are partially handled by training data exposure but not fully solved.
  5. Sarcasm: 「你真太6了」used sarcastically can be missed.

License

CC BY-NC-SA 4.0. Non-commercial research and educational use only.

Citation

@misc{cyberpuppy_v6_2026,
  author       = {Tsai, Hung-Che},
  title        = {CyberPuppy v6: Doubled-Capacity Dual-LoRA for Chinese Cyberbullying Detection},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/thc1006/cyberpuppy-v6-bilingual}}
}

Related Models

Contact & Takedown

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thc1006/cyberpuppy-v6-bilingual

Finetuned
(435)
this model

Dataset used to train thc1006/cyberpuppy-v6-bilingual

Evaluation results