Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models | Read Paper on Bytez