MLLMs Need 3D-Aware Representation Supervision for Scene Understanding | Read Paper on Bytez