The Design Space of Tri-Modal Masked Diffusion Models

This research introduces a novel tri-modal masked diffusion model pretrained on text, image, and audio, establishing new benchmarks for joint learning and ha...

Level: advanced

By Louis Bethune and 22 other authors

Category: research