Generating Multimodal Driving Scenes via Next-Scene Prediction | Read Paper on Bytez