Local to Global: Learning Dynamics and Effect of Initialization for Transformers | Read Paper on Bytez