This research evaluates post-training quantization baselines for reasoning LLMs on Huawei's Ascend NPU, revealing critical trade-offs between compression and...
Level: advanced
By Yuchen Luo, Fangyue Zhu, Ruining Zhou, Mingzhe Huang, Jian Zhu, Fanyu Fan, Wei Shao
Category: research