This research investigates the fragility of single-layer linear probes in detecting deception within large language models, proposing multi-layer ensembling ...
Level: advanced
By Erik Nordby
Category: research