Learning to Compose and Reason with Language Tree Structures for Visual Grounding | Read Paper on Bytez