Joint Visual Grounding and Tracking With Natural Language Specification | Read Paper on Bytez