WikiDO: A New Benchmark Evaluating Cross-Modal Retrieval for Vision-Language Models | Read Paper on Bytez