Multimodal Representation Learning via Maximization of Local Mutual Information | Read Paper on Bytez