Short Data, Long Context: Distilling Positional Knowledge in Transformers | Read Paper on Bytez