Explore PilotBench, a novel benchmark evaluating LLMs on safety-critical aviation tasks, revealing a critical trade-off between numerical precision and contr...
Level: advanced
By Yalun Wu, Haotian Liu, Zhoujun Li, Boyang Wang
Category: research