A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

This survey explores Process Reward Models as a refined alternative to outcome-based approaches, detailing their lifecycle, training via reinforcement learni...

Level: advanced

By Congming Zheng and 10 other authors

Category: research