This research introduces SAGE, a novel optimizer designed to overcome the memory bottlenecks of AdamW in large language model training by effectively managin...
Level: advanced
By Wooin Lee
Category: research