AMPO: Active Multi Preference Optimization for Self-play Preference Selection | Read Paper on Bytez