Learning and Transferring Sparse Contextual Bigrams with Linear Transformers | Read Paper on Bytez