Building Vision-Language Models on Solid Foundations with Masked Distillation

Devs

Building Vision-Language Models on Solid Foundations with Masked Distillation | Read Paper on Bytez