LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits

Devs

LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits | Read Paper on Bytez