Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval | Read Paper on Bytez