Distill-then-Replace: Efficient Task-Specific Hybrid Attention Model Construction

Explore the Distill-then-Replace method, a novel approach for constructing efficient hybrid attention models by transferring weights from full-attention bloc...

Level: advanced

By Xiaojie Xia, Huigang Zhang, Chaoliang Zhong, Jun Sun, Yusuke Oishi

Category: research