WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning | Read Paper on Bytez