Shuffle-R1: Efficient RL Framework for Multimodal Large Language Models

Shuffle-R1 introduces a novel reinforcement learning framework designed to solve advantage collapsing in multimodal large language models through advanced tr...

Level: advanced

By Linghao Zhu and 8 other authors

Category: research