Explore Any-Depth Alignment (ADA), an inference-time defense mechanism that restores LLM safety by dynamically reintroducing alignment tokens to counter adve...
Level: advanced
By Jiawei Zhang and 4 other authors
Category: discussion