This research introduces critical sharpness, a scalable metric for analyzing LLM training dynamics without the computational cost of full Hessian calculation...
Level: advanced
By Dayal Singh Kalra, Jean-Christophe Gagnon-Audet, Andrey Gromov, Ishita Mediratta, Kelvin Niu, Alexander H Miller, Michael Shvartsman
Category: research