AGGC introduces an adaptive group-wise gradient clipping framework that partitions model parameters to dynamically stabilize large language model training. T...
Level: advanced
By Zhiyuan Li, Yuan Wu, Yi Chang
Category: research