Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck | Read Paper on Bytez