Vision Transformers Are Parameter-Efficient Audio-Visual Learners | Read Paper on Bytez