Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning | Read Paper on Bytez