Video Object Grounding using Semantic Roles in Language Description
2020·Arxiv