On the Convergence of Encoder-only Shallow Transformers | Read Paper on Bytez