Why Policy Gradient Algorithms Work for Undiscounted Total-Reward MDPs

This research establishes rigorous convergence guarantees for policy gradient algorithms in undiscounted total-reward MDPs, introducing a novel transient vis...

Level: expert

By Jongmin Lee, Ernest K. Ryu

Category: research