FlexQuant: A Flexible and Efficient Dynamic Precision Switching Framework for LLM Quantization
Explore FlexQuant, a novel framework enabling layer-wise dynamic precision switching to optimize LLM inference. This research details how adaptive bit-width ...