b
Discover
Models
Search
About
Group Robust Preference Optimization in Reward-free RLHF
7 months ago
·
arXiv