Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment | Read Paper on Bytez