Bridge the Modality and Capability Gaps in Vision-Language Model Selection | Read Paper on Bytez