Vision Function Layer in Multimodal LLMs | Read Paper on Bytez