Shuffle-R1 introduces a novel reinforcement learning framework designed to solve advantage collapsing in multimodal large language models through advanced tr...
Level: advanced
By Linghao Zhu and 8 other authors
Category: research