Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings | Read Paper on Bytez