Why Do Language Model Agents Whistleblow?

This research investigates the conditions under which language model agents disclose suspected misconduct, revealing how task complexity and prompt engineeri...

Level: advanced

By Kushal Agrawal, Frank Xiao, Guido Bergman, Asa Cooper Stickland

Category: discussion