Explore the Distill-then-Replace method, a novel approach for constructing efficient hybrid attention models by transferring weights from full-attention bloc...
Level: advanced
By Xiaojie Xia, Huigang Zhang, Chaoliang Zhong, Jun Sun, Yusuke Oishi
Category: research