Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions | Read Paper on Bytez