When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective | Read Paper on Bytez