Explore SPG, a novel policy gradient method designed to reduce bias in masked diffusion language models by leveraging true log-likelihood bounds for more eff...
Level: advanced
By Unknown
Category: research