Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers | Read Paper on Bytez