MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

MixAtlas introduces a novel approach to optimizing data mixtures for multimodal LLM midtraining by decomposing corpora along image concepts and task supervis...

Level: advanced

By Bingbing Wen

Category: research