LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback

Explore LatentBreak, a novel technique exploiting latent space vulnerabilities to bypass LLM safety mechanisms using low-perplexity prompts. This research hi...

Level: advanced

By Unknown

Category: discussion