Explore Counterfactual Simulation Training, a novel objective designed to enhance Chain-of-Thought faithfulness in large language models by detecting spuriou...
Level: advanced
By Peter Hase, Christopher Potts
Category: research