Are Diffusion Models Vision-And-Language Reasoners? | Read Paper on Bytez