Understanding Transformers via N-Gram Statistics | Read Paper on Bytez