Explore LLM-RG, a hybrid architecture integrating Vision-Language Models and Large Language Models to achieve robust referential grounding in complex outdoor...
Level: advanced
By Pranav Saxena, Avigyan Bhattacharya, Ji Zhang, Wenshan Wang
Category: research