SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models | Read Paper on Bytez