Improving multimodal datasets with image captioning | Read Paper on Bytez