DeepSPI introduces an on-policy algorithm leveraging world models to ensure safe, monotonic policy improvement in reinforcement learning, bridging online and...
Level: advanced
By Unknown
Category: research