MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training | Read Paper on Bytez