This research challenges fixed-layer steering assumptions by introducing W2S, an adaptive framework that selects intervention layers based on input embedding...
Level: advanced
By Soham Gadgil, Chris Lin, Su-In Lee
Category: research