Learning to Merge Tokens via Decoupled Embedding for Efficient Vision Transformers | Read Paper on Bytez