How Do LLMs Use Their Depth?

Explore the non-uniform utilization of layer depth in transformers through a 'Guess-then-Refine' framework that reveals how early layers generate high-freque...

Level: advanced

By Akshat Gupta, Jay Yeung, Gopala Anumanchipalli, Anna Ivanova

Category: research