On the Generalization Ability of Next-Token-Prediction Pretraining | Read Paper on Bytez