Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models | Read Paper on Bytez