Nonlinear transformers can perform inference-time feature learning | Read Paper on Bytez