Reward-Augmented Data Enhances Direct Preference Alignment of LLMs | Read Paper on Bytez