Grounding Multimodal Large Language Models in Actions | Read Paper on Bytez