Modeling Human Beliefs about AI Behavior for Scalable Oversight

This research addresses scalable oversight by introducing belief model covering to mitigate human evaluator misconceptions in high-capacity AI systems. It de...

Level: advanced

By Unknown

Category: research