This research investigates the stability dynamics of Transformers under Layer Normalization, introducing residual step scaling to mitigate hidden state growt...
Level: advanced
By Unknown
Category: research