Model Merging in Pre-training of Large Language Models | Read Paper on Bytez