FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference

FLRC introduces a novel low-rank compression framework that optimizes layer-specific rank allocation and progressive decoding to enhance LLM inference effici...

Level: advanced

By Unknown

Category: research