LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models

Explore LLaDA 1.5 and its VRPO method, a variance-reduced optimization technique designed to enhance alignment and performance in large language diffusion mo...

Level: advanced

By Unknown

Category: research