This survey explores the systemic vulnerability of reward hacking in large language models, introducing the Proxy Compression Hypothesis to explain emergent ...
Level: advanced
By Xiaohua Wang
Category: discussion