This research explores critical vulnerabilities where adaptive attacks subvert trusted monitors via prompt injections, challenging current AI control protoco...
Level: advanced
By Unknown
Category: discussion