Perception Tokens Enhance Visual Reasoning in Multimodal Language Models | Read Paper on Bytez