b
Discover
Models
Search
About
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
1 week ago
·
NeurIPS