Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

AI Safety & Interpretability Lab

non-profit
https://aisilab.github.io/
aisilab
Activity Feed

AI & ML interests

Interpretability-informed control

Recent Activity

EvilScript  new activity 44 minutes ago
aisilab/moltbook-files-new-language-signals:Add paper link, GitHub repository, and task category
EvilScript  authored a paper about 20 hours ago
Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
EvilScript  submitted a paper about 21 hours ago
Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
View all activity

Papers

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

View all Papers

Lukas Galke Poech's profile pictureStine Beltoft's profile pictureWilliam Brach's profile pictureFederico Torrielli's profile picture

aisilab 's collections 1

Moltbook Models
  • filter-with-espresso/Qwen2.5-14B-Instruct-reddit-baseline-v3-high

    Updated Mar 17
  • filter-with-espresso/Qwen2.5-14B-Instruct-reddit-baseline-v2-low

    Updated Mar 16
  • filter-with-espresso/Qwen2.5-14B-Instruct-reddit-baseline-v1

    Updated Mar 16
  • filter-with-espresso/Qwen2.5-14B-Instruct-moltbook-finetune-v9

    Updated Mar 15 • 1
Moltbook Models
  • filter-with-espresso/Qwen2.5-14B-Instruct-reddit-baseline-v3-high

    Updated Mar 17
  • filter-with-espresso/Qwen2.5-14B-Instruct-reddit-baseline-v2-low

    Updated Mar 16
  • filter-with-espresso/Qwen2.5-14B-Instruct-reddit-baseline-v1

    Updated Mar 16
  • filter-with-espresso/Qwen2.5-14B-Instruct-moltbook-finetune-v9

    Updated Mar 15 • 1
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs