This research introduces a framework for accelerating sparse LLM inference on CIM accelerators using block-diagonal sparsity, achieving significant memory an...
Level: advanced
By Unknown
Category: research