Honesty over Accuracy: Trustworthy Language Models through Reinforced Hesitation
This research introduces Reinforced Hesitation, an RLVR framework using ternary rewards to balance accuracy with honest abstention, offering a scalable solut...
Level: advanced
By Mohamad Amin Mohamadi, Tianhao Wang, Zhiyuan Li