Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models

Segment Policy Optimization introduces a novel segment-level framework for RL in LLMs, offering precise credit assignment and significant memory efficiency g...

Level: advanced

By Yiran Guo, Lijie Xu, Jie Liu, Dan Ye, Shuang Qiu

Category: research