Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models
This research introduces BGPO, a novel framework enabling memory-efficient reinforcement learning for diffusion large language models through linearized obje...