Turning a CLIP Model Into a Scene Text Detector | Read Paper on Bytez