Explore the critical disconnect between high SFT scores and actual RLVR performance. This research introduces robust metrics like generalization loss to ensu...
Level: advanced
By Unknown
Category: research