PRISM: Deriving the Transformer as a Signal-Denoising Operator via Maximum Coding Rate Reduction
This research derives the Transformer architecture through Maximum Coding Rate Reduction, modeling attention as gradient ascent on a signal-noise manifold to...