Tell Me What Happened: Unifying Text-Guided Video Completion via Multimodal Masked Video Generation | Read Paper on Bytez