Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well | Read Paper on Bytez