VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making | Read Paper on Bytez