Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals | Read Paper on Bytez