Attention with Trained Embeddings Provably Selects Important Tokens | Read Paper on Bytez