Microsoft Boffins Show LLM Safety Can Be Trained Away

Discover how a simple training prompt can dismantle safety guardrails in major AI models, revealing critical vulnerabilities known as sleeper-agent backdoors.

Level: intermediate

By Unknown

Category: discussion