VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos | Read Paper on Bytez