Aligning What Matters: Masked Latent Adaptation for Text-to-Audio-Video Generation | Read Paper on Bytez