CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model | Read Paper on Bytez