The Marginal Value of Adaptive Gradient Methods in Machine Learning | Read Paper on Bytez