Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders | Read Paper on Bytez