Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Read Paper on Bytez