Understanding and Minimising Outlier Features in Transformer Training | Read Paper on Bytez