Output Supervision Can Obfuscate the Chain of Thought

This research reveals how output supervision can hide flawed reasoning in AI models and introduces architectural solutions to ensure transparent, safe chain-...

Level: advanced

By Jacob Drori, Luke Marks, Bryce Woodworth, Alex Cloud, Alexander Matt Turner

Category: research