Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization | Read Paper on Bytez