CyberPuppy v6 — Bilingual Text LoRA (LoRA-A)

中文網路霸凌偵測 v6 · Qwen3-8B + LoRA r=64 + 4-head + 雙語 + CNTP 對抗訓練

Updated v6 with doubled LoRA capacity (r=64) and 5-epoch training for stronger phonetic robustness on systematic homophonic perturbations. Part of dual-LoRA ensemble with v6-pinyin-lora.

What changed from v5

	v5	v6
LoRA rank	32	64 (doubled capacity)
LoRA alpha	64	128
Training epochs	3	5 (with early-stopping at best)
Max length	128/192	192 (consistent)
Consistency loss	λ=0.5	λ=0.5
Best F1 (dev)	0.8440	0.8435 (peaked at epoch 1)

Note: Both v5.1 and v6 best checkpoints peak at epoch 1-2 — additional capacity helps downstream test sets (especially HED-COLD), not standalone dev F1.

Performance (dual-LoRA ensemble v6 + α=0.60 + dec_thresh=0.52)

Benchmark	v5.1.1	v6.0	Δ
COLD test F1_w	0.8302	0.8321	+0.19pt ✓
PCR-ToxiCN (real-world)	0.7162	0.7119	−0.43pt
ToxiCloakCN homo abs F1	0.8496	0.8510	+0.14pt ✓
HED-COLD (homophone)	0.9126	0.9317	+1.91pt ✓✓
6 Traditional Chinese threats	6/6	6/6	—
Average	0.8272	0.8317	+0.45pt

PCR-ToxiCN remains world-class: v6.0's 0.7119 still exceeds published SOTA 0.672 by +3.99pt (vs v5.1.1's +4.42pt). Trade-off accepted for HED-COLD breakthrough.

Quick Start

import torch, re
from peft import PeftModel
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
from pypinyin import pinyin, Style

device = torch.device("cuda")
dtype = torch.bfloat16
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B-Base")

def load_branch(repo_id):
    base = AutoModel.from_pretrained("Qwen/Qwen3-8B-Base", torch_dtype=dtype, device_map=device)
    model = PeftModel.from_pretrained(base, repo_id, subfolder="lora")
    model.eval()
    # heads.pt: nn.ModuleDict({task: nn.Linear(H, dim)}) state_dict
    heads_state = torch.load(hf_hub_download(repo_id, "heads.pt"),
                             map_location=device, weights_only=False)["heads"]
    W_tox = heads_state["toxicity.weight"].to(device=device, dtype=dtype)  # (3, H)
    b_tox = heads_state["toxicity.bias"].to(device=device, dtype=dtype)    # (3,)
    return model, W_tox, b_tox

model_t, W_t, b_t = load_branch("thc1006/cyberpuppy-v6-bilingual")
model_p, W_p, b_p = load_branch("thc1006/cyberpuppy-v6-pinyin-lora")

_HAN = re.compile(r"[㐀-䶿一-鿿豈-﫿]")
def to_pinyin(text):
    return " ".join(pinyin(ch, style=Style.NORMAL)[0][0] if _HAN.match(ch) else ch
                    for ch in text if ch.strip())

@torch.inference_mode()
def predict(text, alpha=0.60, dec_thresh=0.52):
    enc_t = tok(text, return_tensors="pt", truncation=True, max_length=192).to(device)
    enc_p = tok(to_pinyin(text), return_tensors="pt", truncation=True, max_length=192).to(device)
    h_t = model_t(**enc_t).last_hidden_state[:, -1]
    h_p = model_p(**enc_p).last_hidden_state[:, -1]
    logits_t = (h_t @ W_t.T + b_t).float()
    logits_p = (h_p @ W_p.T + b_p).float()
    text_probs   = logits_t.softmax(-1)
    pinyin_probs = logits_p.softmax(-1)
    # v6.0 strategy: α=0.60 geo-mean, no pinyin override, decision_thresh=0.52
    ens = (text_probs ** alpha) * (pinyin_probs ** (1 - alpha))
    ens = ens / ens.sum(-1, keepdim=True)
    toxic_prob = (ens[:, 1] + ens[:, 2]).item()
    return "toxic" if toxic_prob > dec_thresh else "none", toxic_prob

label, p = predict("你這個笨蛋，滾開！")
print(f"{label} (toxic_prob={p:.3f})")  # toxic

Training Details

Parameter	Value
Base model	Qwen/Qwen3-8B-Base
LoRA rank	64
LoRA alpha	128
Target modules	All linear (q/k/v/o/gate/up/down)
Training data	179,186 samples (v5 bilingual)
Epochs	5
Best epoch	1 (step 9954)
Learning rate	3e-5
Batch size	6 × 6 gradient accumulation
Max length	192 tokens
Precision	bf16
Loss	Focal (γ=2.5) + uncertainty multi-task + consistency (λ=0.5)
Hardware	1× NVIDIA RTX 5090 (32GB, 590W OC)

Limitations

PCR-ToxiCN slightly lower than v5.1.1 (0.7119 vs 0.7162). For maximum real-world robustness, you can use α=0.5 + pinyin override 0.7 + thresh 0.48 instead, getting PCR 0.7283 but COLD drops to 0.8198.
English input: Out of distribution.
Cultural / semantic attacks: e.g., 動物園貓孝子, 淺草寺吧 — these require world knowledge, not phonetics. ~30-40 of PCR's 250 toxic samples remain hard.
Numeric / letter substitution attacks (G8, S13, 64.5克黄金=250) are partially handled by training data exposure but not fully solved.
Sarcasm: 「你真太6了」used sarcastically can be missed.

License

CC BY-NC-SA 4.0. Non-commercial research and educational use only.

Citation

@misc{cyberpuppy_v6_2026,
  author       = {Tsai, Hung-Che},
  title        = {CyberPuppy v6: Doubled-Capacity Dual-LoRA for Chinese Cyberbullying Detection},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/thc1006/cyberpuppy-v6-bilingual}}
}

Related Models

thc1006/cyberpuppy-v6-pinyin-lora — companion pinyin LoRA
thc1006/cyberpuppy-v5-bilingual — previous version (still has higher PCR-ToxiCN)
thc1006/cyberpuppy-v5-pinyin-lora — v5 pinyin LoRA

Contact & Takedown

Author: Hung-Che Tsai (hctsai1006@cs.nctu.edu.tw)
Takedown: Email above — removed within 7 days

Downloads last month: -

Model tree for thc1006/cyberpuppy-v6-bilingual

Base model

Qwen/Qwen3-8B-Base

Finetuned

(435)

this model

Dataset used to train thc1006/cyberpuppy-v6-bilingual

Evaluation results

F1 (weighted) on COLD
test set self-reported

0.832
F1 (weighted, exceeds SOTA 0.672 by +3.99pt) on PCR-ToxiCN
self-reported

0.712
F1 (homophone-absolute) on ToxiCloakCN (heldout)
test set self-reported

0.851
F1 (weighted) on HED-COLD
test set self-reported

0.932