Linear Probe Accuracy Scales with Model Size and Benefits from Multi-Layer Ensembling

This research investigates the fragility of single-layer linear probes in detecting deception within large language models, proposing multi-layer ensembling ...

Level: advanced

By Erik Nordby

Category: research