b
Discover
Models
Search
About
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
2023
·
NeurIPS