Diverse Deception Probes Collection Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated 30 days ago
Diverse Deception Probes Collection Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated 30 days ago
Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks Paper • 2602.14689 • Published Feb 16 • 1