CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | Read Paper on Bytez