This research introduces CSAttention, a training-free sparse attention method that accelerates long-context LLM inference by shifting computation to an offli...
Level: advanced
By Chuxu Song
Category: research