Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks
This research introduces a game-theoretic benchmark exposing critical vulnerabilities in LLM safety guardrails during multi-turn harassment attacks, revealin...