ACCO introduces a novel approach to distributed sharded LLM training by synchronizing delayed gradients to minimize communication overhead and GPU idle time,...
Level: advanced
By Unknown
Category: research