EcoSpa: Efficient Transformer Training with Coupled Sparsity

EcoSpa introduces a novel structured sparsity framework for Transformers that jointly prunes coupled weight matrices while preserving critical cross-matrix i...

Level: advanced

By Jinqi Xiao and 10 other authors

Category: research