Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth

Explore Any-Depth Alignment (ADA), an inference-time defense mechanism that restores LLM safety by dynamically reintroducing alignment tokens to counter adve...

Level: advanced

By Jiawei Zhang and 4 other authors

Category: discussion