Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies for Reasoning Models
This research reveals critical limitations in 4-bit quantization for reasoning models, introducing scale-dependent strategies that optimize memory allocation...