Mechanistic Interpretability Benchmark

Principled evaluation of mechanistic interpretability methods.