Learning to Focus: Causal Attention Distillation via Gradient‐Guided Token Pruning | Read Paper on Bytez