Current LLMs seem to rarely detect CoT tampering — LessWrong

Discover why current Large Language Models struggle to detect when their internal reasoning steps are manipulated, and learn why this poses a critical risk t...

Level: intermediate

By Unknown

Category: discussion