Building Vision-Language Models on Solid Foundations with Masked Distillation | Read Paper on Bytez