What Makes Looped Transformers Perform Better Than Non-Recursive Ones (Provably)

This research establishes the provable superiority of looped transformers over non-recursive models by analyzing loss landscape geometry and the SHIFT framew...

Level: expert

By Unknown

Category: research