Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training | Read Paper on Bytez