This research uncovers Internal Safety Collapse, a critical failure mode where frontier LLMs generate harmful content even during benign tasks, challenging c...
Level: advanced
By Yutao Wu
Category: discussion