Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding | Read Paper on Bytez