NoiseFormer -- Noise Diffused Symmetric Attention Transformer

NoiseFormer introduces a novel NDSA mechanism that leverages sparse attention to optimize gradient flow and reduce memory usage in transformers without sacri...

Level: advanced

By Phani Kumar, Nyshadham, Jyothendra Varma, Polisetty V R K, Aditya Rathore

Category: research