Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm

Explore TRRO, a novel stable Inverse Reinforcement Learning framework that leverages minorization-maximization to achieve state-of-the-art reward recovery an...

Level: expert

By Unknown

Category: research