VILA: On Pre-training for Visual Language Models | Read Paper on Bytez