Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing | Read Paper on Bytez