HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding | Read Paper on Bytez