Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key | Read Paper on Bytez