Max-Margin Token Selection in Attention Mechanism | Read Paper on Bytez