MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model | Read Paper on Bytez