$\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$ | Read Paper on Bytez