This research investigates the conditions under which language model agents disclose suspected misconduct, revealing how task complexity and prompt engineeri...
Level: advanced
By Kushal Agrawal, Frank Xiao, Guido Bergman, Asa Cooper Stickland
Category: discussion