Finite-Time Analysis of Gradient Descent for Shallow Transformers

This paper establishes nonasymptotic convergence bounds for gradient descent in shallow Transformers within the kernel regime, revealing sequence-length inde...

Level: expert

By Enes Arda, Semih Cayci, Atilla Eryilmaz

Category: research