Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding | Read Paper on Bytez