Vision language models are blind: Failing to translate detailed visual features into words | Read Paper on Bytez