A Compact Pretraining Approach for Neural Language Models | Read Paper on Bytez