Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification | Read Paper on Bytez