MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs

MoBiQuant introduces a novel mixture-of-bits quantization framework that dynamically adjusts weight precision per token sequence to enhance LLM inference eff...

Level: advanced

By Dongwei Wang and 10 other authors

Category: research