Enhancing Vision-Language Pre-training with Rich Supervisions | Read Paper on Bytez