SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data | Read Paper on Bytez