Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models | Read Paper on Bytez