Dynamic Inference With Grounding Based Vision and Language Models | Read Paper on Bytez