The Token Games: Evaluating Language Model Reasoning with Puzzle Duels

Explore the Token Games framework, an unsupervised method using pairwise programming duels and Elo ratings to rigorously evaluate large language model reason...

Level: advanced

By Unknown

Category: research