CREPE: Can Vision-Language Foundation Models Reason Compositionally? | Read Paper on Bytez