Stacked Attention Networks for Image Question Answering
2015·Arxiv