On Scaling Up a Multilingual Vision and Language Model | Read Paper on Bytez