Explore the non-uniform utilization of layer depth in transformers through a 'Guess-then-Refine' framework that reveals how early layers generate high-freque...
Level: advanced
By Akshat Gupta, Jay Yeung, Gopala Anumanchipalli, Anna Ivanova
Category: research