Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing | Read Paper on Bytez