Mask Attention Networks: Rethinking and Strengthen Transformer | Read Paper on Bytez