Video Question Answering with Iterative Video-Text Co-Tokenization | Read Paper on Bytez