Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings
This research introduces Hindsight-Anchored Policy Optimization (HAPO), a novel framework designed to resolve advantage collapse and distributional bias in s...