Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

IBM Research unveils VAKRA, a rigorous executable benchmark designed to stress-test AI agents on complex reasoning and tool use. This analysis exposes critic...

Level: advanced

By Ankita Naik

Category: research