Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval | Read Paper on Bytez