UK AISI Alignment Evaluation Case-Study

This technical report details the UK AI Security Institute's framework for assessing whether advanced AI systems reliably follow goals, specifically examinin...

Level: advanced

By Alexandra Souly, Robert Kirk, Jacob Merizian, Abby D'Cruz, Xander Davies

Category: discussion