Faster SGD training by minibatch persistency | Read Paper on Bytez