Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions | Read Paper on Bytez