Group Robust Preference Optimization in Reward-free RLHF | Read Paper on Bytez