Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models

This research introduces BGPO, a novel framework enabling memory-efficient reinforcement learning for diffusion large language models through linearized obje...

Level: advanced

By Unknown

Category: research