VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding | Read Paper on Bytez