Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding | Read Paper on Bytez