ReDit: Reward Dithering for Improved LLM Policy Optimization | Read Paper on Bytez