Multi-modal Generation via Cross-Modal In-Context Learning | Read Paper on Bytez