Mitigating Reward Overoptimization via Lightweight Uncertainty Estimation

Devs

Mitigating Reward Overoptimization via Lightweight Uncertainty Estimation | Read Paper on Bytez