This research exposes critical vulnerabilities in LLM-based judge systems where fabricated Chain-of-Thought reasoning can inflate false positive rates by up ...
Level: advanced
By Unknown
Category: research