RADAR: Mechanistic Pathways for Detecting Data Contamination in LLM Evaluation

RADAR leverages mechanistic interpretability to detect data contamination in LLM evaluations by distinguishing between recall and reasoning patterns. This re...

Level: advanced

By Unknown

Category: research