Variance-Adaptive Muon: Accelerating LLM Pretraining with NSR-Modulated and Variance-Scaled Momentum

Explore variance-adaptive variants of the Muon optimizer, Muon-NSR and Muon-VS, designed to accelerate LLM pretraining through orthogonal momentum updates an...

Level: advanced

By Jingru Li, Yibo Fan, Huan Li

Category: research