RepIt introduces a novel neural localization framework for steering Large Language Models via concept-specific refusal vectors, utilizing minimal training da...
Level: advanced
By Vincent Siu, Nathan W. Henry, Nicholas Crispino, Yang Liu, Dawn Song, Chenguang Wang
Category: research