Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization | Read Paper on Bytez