FlexQuant: A Flexible and Efficient Dynamic Precision Switching Framework for LLM Quantization

Explore FlexQuant, a novel framework enabling layer-wise dynamic precision switching to optimize LLM inference. This research details how adaptive bit-width ...

Level: advanced

By Fangxin Liu and 8 other authors

Category: research