HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training | Read Paper on Bytez