This research introduces a statistical framework that unifies heterogeneous AI benchmarks onto a single numerical scale, enabling robust cross-task compariso...
Level: advanced
By Anson Ho, Jean-Stanislas Denain, David Atanasov, Samuel Albanie, Rohin Shah
Category: research