InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | Read Paper on Bytez