Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings

This research introduces Hindsight-Anchored Policy Optimization (HAPO), a novel framework designed to resolve advantage collapse and distributional bias in s...

Level: expert

By Yuning Wu, Ke Wang, Devin Chen, Kai Wei

Category: research