Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models | Read Paper on Bytez