Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment

This research challenges fixed-layer steering assumptions by introducing W2S, an adaptive framework that selects intervention layers based on input embedding...

Level: advanced

By Soham Gadgil, Chris Lin, Su-In Lee

Category: research