Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference

Explore Flux Attention, a novel hybrid framework designed to overcome scalability bottlenecks in long-context LLMs through dynamic layer-level routing betwee...

Level: advanced

By Quantong Qiu

Category: research