VESSA: Video-based objEct-centric Self-Supervised Adaptation for Visual Foundation Models | Read Paper on Bytez