VPO: Reasoning Preferences Optimization Based on $\mathcal{V}$-Usable Information | Read Paper on Bytez