Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models
This research investigates how to accelerate masked diffusion language models by identifying which denoising steps are robust enough to use smaller models, a...