Explore L-MoE, a novel architecture merging Mixture of Experts with Low-Rank Adaptation to achieve end-to-end training with 10% of the parameters of dense mo...
Level: advanced
By Shihao Ji, Zihui Song
Category: research