AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

Devs

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding | Read Paper on Bytez