Localizing Events in Videos with Multimodal Queries | Read Paper on Bytez