NoiseFormer introduces a novel NDSA mechanism that leverages sparse attention to optimize gradient flow and reduce memory usage in transformers without sacri...
Level: advanced
By Phani Kumar, Nyshadham, Jyothendra Varma, Polisetty V R K, Aditya Rathore
Category: research