Beyond Scores: Diagnostic LLM Evaluation via Fine-Grained Abilities

Move beyond aggregate scores to diagnose Large Language Models using Multidimensional Item Response Theory. This research introduces a cognitive framework fo...

Level: advanced

By Xu Zhang

Category: research