Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem | Read Paper on Bytez