Behavior Alignment via Reward Function Optimization | Read Paper on Bytez