VILA: Learning Image Aesthetics From User Comments With Vision-Language Pretraining | Read Paper on Bytez