Discover how a simple training prompt can dismantle safety guardrails in major AI models, revealing critical vulnerabilities known as sleeper-agent backdoors.
Level: intermediate
By Unknown
Category: discussion