Cloze-driven Pretraining of Self-attention Networks | Read Paper on Bytez