This research introduces PROBE, a benchmark decomposing proactive problem solving in LLM agents into latent state inference, bottleneck detection, and autono...
Level: advanced
By Gil Pasternak and 6 other authors
Category: research