Explore the Differential Transformer framework, which utilizes noise-canceled attention and the lightweight DEX module to enhance pretrained model expressivi...
Level: advanced
By Chaerin Kong, Jiho Jang, Nojun Kwak
Category: research