Two Heads are Better than One: Simulating Large Transformers with Small Ones | Read Paper on Bytez