Evaluating Large Language Models on Quantum Mechanics: A Comparative Study Across Diverse Models and Tasks
This study benchmarks 15 large language models on quantum mechanics tasks, revealing distinct performance hierarchies and the nuanced impact of tool augmenta...