Small nonlinearities in activation functions create bad local minima in neural networks | Read Paper on Bytez