Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition
This research introduces a novel reward decomposition framework to eliminate sycophancy in LLMs, addressing the failure of scalar reward models to distinguis...