Question Aware Vision Transformer for Multimodal Reasoning | Read Paper on Bytez