This research investigates how specific reward designs enhance physical reasoning in Vision-Language Models using GRPO training, revealing that attention-bas...
Level: advanced
By Derek Lilienthal
Category: research