Impact of Layer Norm on Memorization and Generalization in Transformers | Read Paper on Bytez