Gradient Multi-Normalization for Efficient LLM Training | Read Paper on Bytez