Systematic Reward Gap Optimization for Mitigating VLM Hallucinations | Read Paper on Bytez