Better NTK Conditioning: A Free Lunch from (ReLU) Nonlinear Activation in Wide Neural Networks
This research establishes that ReLU activations inherently improve convergence stability in wide neural networks by enhancing feature separation and reducing...