Focused Transformer: Contrastive Training for Context Scaling | Read Paper on Bytez