Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization

This research introduces Contribution-Weighted GRPO, a novel framework designed to stabilize the training of LLM search agents by integrating process supervi...

Level: advanced

By Junzhe Wang

Category: research