Do Vision and Language Encoders Represent the World Similarly? | Read Paper on Bytez