This research investigates critical vulnerabilities in frontier AI agents, revealing how semantic obfuscation allows sandbagging to evade current monitoring ...
Level: advanced
By Francis Rhys Ward and 8 other authors
Category: discussion