On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability | Read Paper on Bytez