Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models

Explore ADRPO, a novel fine-tuning paradigm that dynamically adjusts regularization based on sample quality to enhance generative model performance across te...

Level: advanced

By Jiajun Fan, Tong Wei, Chaoran Cheng, Yuxin Chen, Ge Liu

Category: research