SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward | Read Paper on Bytez