Diversity-Aware Policy Optimization for Large Language Model Reasoning | Read Paper on Bytez