Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

AI Safety & Interpretability Lab

non-profit
https://aisilab.github.io/
aisilab
Activity Feed

AI & ML interests

Interpretability-informed control

Recent Activity

EvilScript  new activity about 17 hours ago
aisilab/moltbook-files-new-language-signals:Add paper link, GitHub repository, and task category
EvilScript  authored a paper 1 day ago
Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
EvilScript  submitted a paper 1 day ago
Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
View all activity

Papers

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

View all Papers

Lukas Galke Poech's profile pictureStine Beltoft's profile pictureWilliam Brach's profile pictureFederico Torrielli's profile picture

aisilab 's datasets 3

aisilab/moltbook-files-new-language-signals

Viewer • Updated about 17 hours ago • 518 • 220

aisilab/moltbook-files

Viewer • Updated 27 days ago • 232k • 95

aisilab/moltbook-embeddings

Viewer • Updated 29 days ago • 189k • 176
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs