Robust Preference Optimization through Reward Model Distillation | Read Paper on Bytez