Improving Adaptivity via Over-Parameterization in Sequence Models | Read Paper on Bytez