Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models

Explore Layer-Adaptive Expert Pruning (LAEP), a novel pre-training strategy that dynamically optimizes Mixture-of-Experts models by leveraging token distribu...

Level: advanced

By YuanLab.ai and 11 other authors

Category: research