A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs

This research introduces critical sharpness, a scalable metric for analyzing LLM training dynamics without the computational cost of full Hessian calculation...

Level: advanced

By Dayal Singh Kalra, Jean-Christophe Gagnon-Audet, Andrey Gromov, Ishita Mediratta, Kelvin Niu, Alexander H Miller, Michael Shvartsman

Category: research