Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction
2018·Arxiv