b
Discover
Models
Search
About
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment
6 months ago
·
arXiv