Transfer learning from language models to image caption generators: Better models may not transfer better | Read Paper on Bytez