Benchmark Shadows: Data Alignment, Parameter Footprints, and Generalization in Large Language Models
This research investigates why large language models often excel on benchmarks without improving broader capabilities, revealing how data distribution shapes...