ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents

Explore ToolPRMBench, a novel step-level benchmark designed to rigorously evaluate Process Reward Models in tool-using AI agents through advanced multi-LLM v...

Level: advanced

By Dawei Li, Yuguang Yao, Zhen Tan, Huan Liu, Ruocheng Guo

Category: research