Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models

This research investigates the fundamental architectural shifts when transforming autoregressive models into masked diffusion models, revealing how internal ...

Level: advanced

By Unknown

Category: research