4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration | Read Paper on Bytez