Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models
This research investigates the fundamental architectural shifts when transforming autoregressive models into masked diffusion models, revealing how internal ...