VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation | Read Paper on Bytez