MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation

Explore MMAudioSep, a framework leveraging pretrained video-to-audio models to achieve efficient sound separation through fine-tuning and data-efficient trai...

Level: advanced

By Unknown

Category: research