InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques

InterpBench introduces a semi-synthetic dataset leveraging Strict IIT to rigorously evaluate mechanistic interpretability techniques while maintaining realis...

Level: advanced

By Unknown

Category: research