Extracting Rule-based Descriptions of Attention Features in Transformers

This research introduces a rule-based framework for mechanistic interpretability in transformers, utilizing skip-gram, absence, and counting rules to systema...

Level: advanced

By Dan Friedman, Adithya Bhaskar, Alexander Wettig, Danqi Chen

Category: research