Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking | Read Paper on Bytez