MC#: Mixture Compressor for Mixture-of-Experts Large Models

Explore MC#, a novel framework leveraging static quantization and dynamic expert pruning to compress Mixture-of-Experts models with minimal accuracy loss. Th...

Level: advanced

By Unknown

Category: research