SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning | Read Paper on Bytez