ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
2020·Arxiv