RLTHF: Targeted Human Feedback for LLM Alignment | Read Paper on Bytez