SPAN: Benchmarking and Improving Cross-Calendar Temporal Reasoning of Large Language Models

This research introduces SPAN, a benchmark revealing significant temporal reasoning gaps in current LLMs across diverse calendars, and proposes a tool-augmen...

Level: advanced

By Zhongjian Miao, Hao Fu, Chen Wei

Category: research