Limitations of Normalization in Attention | Read Paper on Bytez