Data Efficient Any Transformer-to-Mamba Distillation via Attention Bridge

Explore the Cross-Architecture Distillation via Attention Bridge (CAB), a novel framework enabling data-efficient knowledge transfer from Transformers to Sta...

Level: advanced

By Penghao Wang, Yuhao Zhou, Mengxuan Wu, Panpan Zhang, Zhangyang Wang, Kai Wang

Category: research