AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding | Read Paper on Bytez