TQL: Scaling Q-Functions with Transformers by Preventing Attention Collapse | Read Paper on Bytez