Causal Interpretation of Self-Attention in Pre-Trained Transformers | Read Paper on Bytez