MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training

Explore MTraining, a novel distributed framework leveraging dynamic sparse attention to enable efficient training of ultra-long context LLMs with 6x throughp...

Level: advanced

By Wenxuan Li, Chengruidong Zhang, Huiqiang Jiang, Yucheng Li, Yuqing Yang, Lili Qiu

Category: research