Benchmark Shadows: Data Alignment, Parameter Footprints, and Generalization in Large Language Models

This research investigates why large language models often excel on benchmarks without improving broader capabilities, revealing how data distribution shapes...

Level: advanced

By Hongjian Zou

Category: research