VAR-MATH: Probing True Mathematical Reasoning in LLMS via Symbolic Multi-Instance Benchmarks
VAR-MATH introduces a symbolic evaluation framework to distinguish genuine mathematical reasoning from memorization in large language models, revealing signi...