Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization
This research introduces Contribution-Weighted GRPO, a novel framework designed to stabilize the training of LLM search agents by integrating process supervi...