SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs

Explore SpeContext, a novel architecture leveraging speculative context sparsity and distilled models to achieve massive throughput improvements in long-cont...

Level: advanced

By Jiaming Xu and 6 other authors

Category: research