Optimizer Fusion: Efficient Training with Better Locality and Parallelism | Read Paper on Bytez