GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling | Read Paper on Bytez