Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis | Read Paper on Bytez