Avoiding exp(R) scaling in RLHF through Preference-based Exploration | Read Paper on Bytez