Post-pre-training for Modality Alignment in Vision-Language Foundation Models | Read Paper on Bytez