This paper establishes nonasymptotic convergence bounds for gradient descent in shallow Transformers within the kernel regime, revealing sequence-length inde...
Level: expert
By Enes Arda, Semih Cayci, Atilla Eryilmaz
Category: research