Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting

This research investigates how on-policy data in Reinforcement Learning mitigates catastrophic forgetting during LLM post-training, demonstrating superior re...

Level: advanced

By Howard Chen, Noam Razin, Karthik Narasimhan, Danqi Chen

Category: research