Resolving Discrepancies in Compute-Optimal Scaling of Language Models | Read Paper on Bytez