CSAttention: Centroid-Scoring Attention for Accelerating LLM Inference

This research introduces CSAttention, a training-free sparse attention method that accelerates long-context LLM inference by shifting computation to an offli...

Level: advanced

By Chuxu Song

Category: research