Parallelizing Linear Transformers with the Delta Rule over Sequence Length | Read Paper on Bytez