Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks | Read Paper on Bytez