Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic
This study investigates grokking in ReLU MLPs, revealing that correct algorithms are encoded during initial memorization via latent structures rather than di...