DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism

DCP introduces a novel approach to managing input dynamism in long-context training, optimizing device mapping to significantly reduce communication overhead...

Level: advanced

By Unknown

Category: research