Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models
Explore VivaBench, a benchmark designed to test Large Language Models' ability to perform sequential clinical reasoning through multi-turn dialogue, revealin...