Explore MC#, a novel framework leveraging static quantization and dynamic expert pruning to compress Mixture-of-Experts models with minimal accuracy loss. Th...
Level: advanced
By Unknown
Category: research