Reproducing steering against evaluation awareness in a large open-weight model

This study reproduces steering vector techniques on the GLM-5 model to investigate how evaluation awareness influences agentic misalignment, revealing critic...

Level: advanced

By UK AISI Model Transparency Team

Category: discussion