A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models
This survey explores Process Reward Models as a refined alternative to outcome-based approaches, detailing their lifecycle, training via reinforcement learni...