Explore GRPO-Verif, a novel unified loss function that jointly optimizes solution generation and self-verification in LLMs, enabling robust, real-time error ...
Level: advanced
By Xiaoxuan Wang, Bo Liu, Song Jiang, Jingzhou Liu, Jingyuan Qi, Xia Chen, Baosheng He
Category: research