Segment Policy Optimization introduces a novel segment-level framework for RL in LLMs, offering precise credit assignment and significant memory efficiency g...
Level: advanced
By Yiran Guo, Lijie Xu, Jie Liu, Dan Ye, Shuang Qiu
Category: research