Learning Robust Vision-Language Models from Natural Latent Spaces | Read Paper on Bytez