When should we train against a scheming monitor?

Explore a formal probabilistic framework for evaluating training against scheming monitors, analyzing the non-linear trade-offs between deception reduction a...

Level: expert

By Unknown

Category: discussion