A General and Efficient Training for Transformer via Token Expansion | Read Paper on Bytez