Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
This research introduces OmniBehavior, a novel benchmark using real-world data to expose critical failures in Large Language Models when simulating complex, ...