FashionViL: Fashion-Focused Vision-and-Language Representation Learning | Read Paper on Bytez