b
Discover
Models
Search
About
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification
1 week ago
·
NeurIPS