AI & ML interests
Interpretability-informed control
Recent Activity
View all activity
Papers
Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals
Organization Card
Edit this README.md markdown file to author your organization card.
models 0
None public yet