MoBiQuant introduces a novel mixture-of-bits quantization framework that dynamically adjusts weight precision per token sequence to enhance LLM inference eff...
Level: advanced
By Dongwei Wang and 10 other authors
Category: research