M³HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality | Read Paper on Bytez