AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training

AGGC introduces an adaptive group-wise gradient clipping framework that partitions model parameters to dynamically stabilize large language model training. T...

Level: advanced

By Zhiyuan Li, Yuan Wu, Yi Chang

Category: research