FAR AI

non-profit

https://far.ai/

AlignmentResearch

Activity Feed Request to join this org

AI & ML interests

Frontier alignment research to ensure the safe development and deployment of advanced AI systems.

Recent Activity

chrisjcundy updated a dataset 3 days ago

AlignmentResearch/roleplay-base-examples

chrisjcundy published a dataset 3 days ago

AlignmentResearch/roleplay-base-examples

chrisjcundy updated a dataset 12 days ago

AlignmentResearch/model-self-knowledge-gemma27b

View all activity

Papers

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

View all Papers

updated a dataset 3 days ago

AlignmentResearch/roleplay-base-examples

Viewer • Updated 3 days ago • 2.92k • 19

published a dataset 3 days ago

AlignmentResearch/roleplay-base-examples

Viewer • Updated 3 days ago • 2.92k • 19

updated a dataset 12 days ago

AlignmentResearch/model-self-knowledge-gemma27b

Viewer • Updated 12 days ago • 6.33k • 58

published a dataset 12 days ago

AlignmentResearch/model-self-knowledge-gemma27b

Viewer • Updated 12 days ago • 6.33k • 58

updated a collection 30 days ago

Diverse Deception Probes

Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated 30 days ago

updated a model 30 days ago

AlignmentResearch/diverse-deception-probe-olmo-3-32b-think

Updated 30 days ago

published a model 30 days ago

AlignmentResearch/diverse-deception-probe-olmo-3-32b-think

Updated 30 days ago

updated a collection 30 days ago

Diverse Deception Probes

Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated 30 days ago

updated a model 30 days ago

AlignmentResearch/diverse-deception-probe-gemma-3-12b-it

Updated 30 days ago

published a model 30 days ago

AlignmentResearch/diverse-deception-probe-gemma-3-12b-it

Updated 30 days ago

updated a model 30 days ago

AlignmentResearch/diverse-deception-probe-qwen3-8b

Updated 30 days ago

published a model 30 days ago

AlignmentResearch/diverse-deception-probe-qwen3-8b

Updated 30 days ago

updated a model 30 days ago

AlignmentResearch/diverse-deception-probe-olmo-3-7b-instruct

Updated 30 days ago

published a model 30 days ago

AlignmentResearch/diverse-deception-probe-olmo-3-7b-instruct

Updated 30 days ago

updated a model 30 days ago

AlignmentResearch/diverse-deception-probe-olmo-3-7b-think

Updated 30 days ago

published a model 30 days ago

AlignmentResearch/diverse-deception-probe-olmo-3-7b-think

Updated 30 days ago

submitted a paper to Daily Papers about 2 months ago

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

Paper • 2602.14689 • Published Feb 16 • 1