Explore MTraining, a novel distributed framework leveraging dynamic sparse attention to enable efficient training of ultra-long context LLMs with 6x throughp...
Level: advanced
By Wenxuan Li, Chengruidong Zhang, Huiqiang Jiang, Yucheng Li, Yuqing Yang, Lili Qiu
Category: research