Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency | Read Paper on Bytez