This research introduces a novel tri-modal masked diffusion model pretrained on text, image, and audio, establishing new benchmarks for joint learning and ha...
Level: advanced
By Louis Bethune and 22 other authors
Category: research