Direct Preference-based Policy Optimization without Reward Modeling | Read Paper on Bytez