Scaling Laws for Upcycling Mixture-of-Experts Language Models | Read Paper on Bytez