D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition
D-CoDe introduces a novel framework for scaling image-pretrained Vision-Language Models to video tasks using dynamic compression and question decomposition t...