GVPO: Group Variance Policy Optimization for Large Language Model Post-Training | Read Paper on Bytez