Explore Layer-Adaptive Expert Pruning (LAEP), a novel pre-training strategy that dynamically optimizes Mixture-of-Experts models by leveraging token distribu...
Level: advanced
By YuanLab.ai and 11 other authors
Category: research