CovMatch: Cross-Covariance Guided Multimodal Dataset Distillation with Trainable Text Encoder

CovMatch introduces a novel distillation framework that jointly optimizes image and text encoders to achieve robust cross-modal alignment using synthetic dat...

Level: advanced

By Yongmin Lee, Hye Won Chung

Category: research