ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Devs

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts | Read Paper on Bytez