SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs
Explore SpeContext, a novel architecture leveraging speculative context sparsity and distilled models to achieve massive throughput improvements in long-cont...