MAGVLT: Masked Generative Vision-and-Language Transformer | Read Paper on Bytez