X-VILA: Cross-Modality Alignment for Large Language Model | Read Paper on Bytez