MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation
Explore MMAudioSep, a framework leveraging pretrained video-to-audio models to achieve efficient sound separation through fine-tuning and data-efficient trai...