Are we pretraining it right? Digging deeper into visio-linguistic pretraining | Read Paper on Bytez