Policy-labeled Preference Learning: Is Preference Enough for RLHF? | Read Paper on Bytez