Meta-Personalizing Vision-Language Models To Find Named Instances in Video | Read Paper on Bytez