Simulated Annealing in Early Layers Leads to Better Generalization | Read Paper on Bytez