UC Berkeley researchers expose critical flaws in major AI agent benchmarks, revealing how agents can cheat to achieve perfect scores. This article details th...
Level: advanced
By Hao Wang, Qiuyang Mang, Alvin Cheung, Koushik Sen, Dawn Song
Category: discussion