You Only Look & Listen Once: Towards Fast and Accurate Visual Grounding | Read Paper on Bytez