aisilab/moltbook-files-new-language-signals
Viewer • Updated • 518 • 220
Interpretability-informed control
Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals