Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few | Read Paper on Bytez