🚀 Released two Responsible AI lightweight instruction-tuned models focused on toxicity, bias, and safety analysis
Model 1: Responsible AI Safety Assistant (Qwen 2.5)
Kurapika993/qwen2.5-7b-responsible-ai-qlora Base Model: Qwen2.5-7B-Instruct Method: QLoRA Training Data: BeaverTails + Wiki Toxic + custom Responsible AI instruction dataset
The core idea is that the same utterance can become toxic or benign depending on the surrounding social situation. With is generation framework you can create such datasets at scale.
The pipeline supports:
direct context augmentation given the seed utterance new utterance-context pair generation given seed utterances multistage generation for diverse examples validation with a critic model CSV / JSONL export
Example:
Utterance: “You are so lucky to work from home.”
Benign context: A friend congratulates someone on improved work-life balance.
Toxic context: A colleague dismisses someone struggling with childcare and burnout.