Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model | Read Paper on Bytez