Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

Devs

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning | Read Paper on Bytez