DH-Set: Improving Vision-Language Alignment with Diverse and Hybrid Set-Embeddings Learning | Read Paper on Bytez