TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training

TawPipe introduces a topology-aware weight pipeline parallelism method designed to minimize communication overhead and enhance scalability for long-context l...

Level: advanced

By Houming Wu, Ling Chen

Category: research