Exploiting Spatial-Temporal Modelling and Multi-Modal Fusion for Human Action Recognition | Read Paper on Bytez