GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning | Read Paper on Bytez