AVFormer: Injecting Vision Into Frozen Speech Models for Zero-Shot AV-ASR | Read Paper on Bytez