Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception | Read Paper on Bytez