COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens

Explore COMPACT, a novel pruning method designed to optimize channels and tokens in large language models like Qwen and LLaMA for efficient deployment on con...

Level: advanced

By Unknown

Category: research