Explore ToolPRMBench, a novel step-level benchmark designed to rigorously evaluate Process Reward Models in tool-using AI agents through advanced multi-LLM v...
Level: advanced
By Dawei Li, Yuguang Yao, Zhen Tan, Huan Liu, Ruocheng Guo
Category: research