SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation | Read Paper on Bytez