Leveraging Computerized Adaptive Testing for Cost-effective Evaluation of Large Language Models in Medical Benchmarking
This research introduces a Computerized Adaptive Testing framework grounded in Item Response Theory to efficiently evaluate Large Language Models in medical ...