Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting | Read Paper on Bytez