Why Do We Need Weight Decay in Modern Deep Learning? | Read Paper on Bytez