SceneCOT introduces a novel hierarchical framework for grounded chain-of-thought reasoning in 3D scenes, leveraging cross-modal attention to minimize halluci...
Level: advanced
By Xiongkun Linghu and 4 other authors
Category: research