Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign? | Read Paper on Bytez