TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment | Read Paper on Bytez