This study reproduces steering vector techniques on the GLM-5 model to investigate how evaluation awareness influences agentic misalignment, revealing critic...
Level: advanced
By UK AISI Model Transparency Team
Category: discussion