What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers | Read Paper on Bytez