SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization

This research introduces SAGE, a novel optimizer designed to overcome the memory bottlenecks of AdamW in large language model training by effectively managin...

Level: advanced

By Wooin Lee

Category: research