b
Discover
Models
Search
About
Group Robust Preference Optimization in Reward-free RLHF
6 months ago
·
arXiv