Clip Your Sequences Fairly: Enforcing Length Fairness for Sequence-Level RL

Explore FSPO, a novel approach to enforcing length fairness in sequence-level reinforcement learning by addressing bias in importance-sampling weights throug...

Level: advanced

By Unknown

Category: research