A KL Lens on Quantization: Fast, Forward-Only Sensitivity for Mixed-Precision SSM-Transformer Models
This research introduces a backpropagation-free sensitivity analysis framework using KL divergence to optimize hybrid SSM-Transformer models for edge deploym...