Width Provably Matters in Optimization for Deep Linear Neural Networks | Read Paper on Bytez