Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards | Read Paper on Bytez