STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning | Read Paper on Bytez