S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions | Read Paper on Bytez