Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs

This research introduces a framework for accelerating sparse LLM inference on CIM accelerators using block-diagonal sparsity, achieving significant memory an...

Level: advanced

By Unknown

Category: research