Towards Label-free Scene Understanding by Vision Foundation Models | Read Paper on Bytez