Learning In-context $n$-grams with Transformers: Sub-$n$-grams Are Near-Stationary Points | Read Paper on Bytez