This research establishes rigorous convergence guarantees for policy gradient algorithms in undiscounted total-reward MDPs, introducing a novel transient vis...
Level: expert
By Jongmin Lee, Ernest K. Ryu
Category: research