Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic

This study investigates grokking in ReLU MLPs, revealing that correct algorithms are encoded during initial memorization via latent structures rather than di...

Level: advanced

By Anand Swaroop

Category: research