On the Utility of Gradient Compression in Distributed Training Systems | Read Paper on Bytez