DreamPRM-1.5: Unlocking the Potential of Each Instance for Multimodal Process Reward Model Training
Explore DreamPRM-1.5, a novel instance-level reweighting framework utilizing bi-level optimization to enhance multimodal process reward modeling and mitigate...