Debate as Reward: A Multi-Agent Reward System for Scientific Ideation via RL Post-Training | Read Paper on Bytez