Can SGD Learn Recurrent Neural Networks with Provable Generalization? | Read Paper on Bytez