Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis | Read Paper on Bytez