Preference Distillation via Value based Reinforcement Learning | Read Paper on Bytez