Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders

Devs

Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders | Read Paper on Bytez