Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
Paper • 2605.31170 • Published • 8
Interpretability-informed control
Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals