Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

Devs

Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances | Read Paper on Bytez