PHANTOM RECALL: When Familiar Puzzles Fool Smart Models

This research introduces PHANTOM RECALL, a benchmark exposing how Large Language Models fail at logic puzzles despite linguistic fluency, highlighting critic...

Level: advanced

By Unknown

Category: research