Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations | Read Paper on Bytez