Stability of Transformers under Layer Normalization

This research investigates the stability dynamics of Transformers under Layer Normalization, introducing residual step scaling to mitigate hidden state growt...

Level: advanced

By Unknown

Category: research