Aligning where to see and what to tell: image caption with region-based attention and scene factorization

Devs

Aligning where to see and what to tell: image caption with region-based attention and scene factorization | Read Paper on Bytez