Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics | Read Paper on Bytez