Power Lines: Scaling laws for weight decay and batch size in LLM pre-training | Read Paper on Bytez