Disentangled Cross-Modal Representation Learning with Enhanced Mutual Supervision | Read Paper on Bytez