b
Discover
Models
Search
About
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
1 week ago
·
NeurIPS