Compositional Video Understanding with Spatiotemporal Structure-based Transformers

Devs

Compositional Video Understanding with Spatiotemporal Structure-based Transformers | Read Paper on Bytez