How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization

This research establishes theoretical bounds for transformer optimality in in-context recall, revealing how attention design and parameterization dictate gen...

Level: expert

By Quan Nguyen, Thanh Nguyen-Tang

Category: research