Learning To Generate Text-Grounded Mask for Open-World Semantic Segmentation From Only Image-Text Pairs | Read Paper on Bytez